1 MSc - Microprocessors Dr. Konstantinos Tatas [email protected].
-
Upload
rebecca-farmer -
Category
Documents
-
view
226 -
download
2
Transcript of 1 MSc - Microprocessors Dr. Konstantinos Tatas [email protected].
11
MSc - MicroprocessorsMSc - Microprocessors
Dr. Konstantinos TatasDr. Konstantinos Tatas
[email protected]@fit.ac.cy
22
Useful InformationUseful Information
Instructor: Lecturer K. TatasInstructor: Lecturer K. Tatas– Office hours: TBAOffice hours: TBA– E-mail: E-mail: [email protected]@fit.ac.cy– http://http://staff.fit.ac.cy/com.tkstaff.fit.ac.cy/com.tk
Lecture periods/week: 3Lecture periods/week: 3 Duration: 10 weeksDuration: 10 weeks ECTS: 7 (175 hours)ECTS: 7 (175 hours)
33
Course ObjectivesCourse Objectives
By the end of the course students should be By the end of the course students should be able to:able to:– Evaluate the complex trade-offs involved in Evaluate the complex trade-offs involved in
embedded system designembedded system design– Write detailed embedded system requirements and Write detailed embedded system requirements and
specification documentsspecification documents– Write executable specifications using UML/SystemCWrite executable specifications using UML/SystemC– Develop applications using ARM Developer SuiteDevelop applications using ARM Developer Suite– Write efficient ARM assembly and C programs in Write efficient ARM assembly and C programs in
ARM and Thumb modeARM and Thumb mode– Analyze program performance using tracesAnalyze program performance using traces– Use code transformations to improve Use code transformations to improve
performance/code size/power consumption.performance/code size/power consumption.
44
Course Outline (1/2)Course Outline (1/2) Week 1: Introduction to embedded systems – Embedded Week 1: Introduction to embedded systems – Embedded
microprocessor evolution – Design metrics and constraints microprocessor evolution – Design metrics and constraints (performance, power, cost, time-to-market) and design (performance, power, cost, time-to-market) and design optimization challenges - Distributed and Real-time optimization challenges - Distributed and Real-time systemssystems
Week2: Key embedded system technologies – Integrated Week2: Key embedded system technologies – Integrated Circuit technology – Microprocessor technology – CAD tool Circuit technology – Microprocessor technology – CAD tool technology – Sensor technologytechnology – Sensor technology
Week 3: Embedded system specification and modeling – Week 3: Embedded system specification and modeling – Object-oriented specification (UML/C++/SystemC) – Object-oriented specification (UML/C++/SystemC) – Assignment 1Assignment 1
Week 4: Computer Architecture – Instruction sets – RISC Week 4: Computer Architecture – Instruction sets – RISC vs. CISC – pipelining - The ARM microprocessor vs. CISC – pipelining - The ARM microprocessor architecture - ARM assembly – ARM mode – Thumb mode - architecture - ARM assembly – ARM mode – Thumb mode - ARM and Thumb instruction set - ARM conditional ARM and Thumb instruction set - ARM conditional execution execution
Week 5: Processor I/O – Serial I/O – Busy/wait I/O – Week 5: Processor I/O – Serial I/O – Busy/wait I/O – Interrupts – Exceptions – Traps – ARM memory mapped I/O Interrupts – Exceptions – Traps – ARM memory mapped I/O - Caches – Memory Management Units – Protection Units – - Caches – Memory Management Units – Protection Units – ARM cache and MMU – Assignment 2ARM cache and MMU – Assignment 2
55
Course Outline (2/2)Course Outline (2/2)
Week 6: Assignment 1Week 6: Assignment 1 Week 7: Programme design and analysis – Week 7: Programme design and analysis –
DFGs – CDFGs – Compilers – Assemblers – DFGs – CDFGs – Compilers – Assemblers – Linkers – Basic compiler optimizations/code Linkers – Basic compiler optimizations/code transformations – Measuring programme transformations – Measuring programme speed – Trace-driven performance analysis – speed – Trace-driven performance analysis – Energy optimization – programme size Energy optimization – programme size optimizationoptimization
Week 8: Code transformations – Loop Week 8: Code transformations – Loop unrolling – loop merging – loop tiling – unrolling – loop merging – loop tiling – performance optimizing transformationsperformance optimizing transformations
Week 9: TestWeek 9: Test Week 10: Assignment 2Week 10: Assignment 2
66
Course AssessmentCourse Assessment
Final exam: 40%Final exam: 40% Coursework: 60%Coursework: 60%
– Assignment 1: 15%Assignment 1: 15%– Assignment 2: 15%Assignment 2: 15%– Quizzes: 10%Quizzes: 10%– Test: 10%Test: 10%– Lab exercises: 10%Lab exercises: 10%
77
ReferencesReferences
Books– W. Wolf, “Computers as Components”– W. Wolf, “High-Performance Embedded
Computing”– H. Kopetz, “Real-Time Systems: Design
Principles for Distributed Embedded Applications”
– S. Furber, “ARM System-on-Chip Architecture”– P. Panda, “Memory Issues in Embedded
Systems-on-Chip”– F. Vahid and T. Givargis, “Embedded System
Design: A Unified Hardware/Software Introduction”
– F. Catthoor, “Data Access and Storage Management for Embedded Programmable Processors”
88
Microprocessors for Microprocessors for Embedded systemsEmbedded systems Computing systems are everywhereComputing systems are everywhere Most of us think of “desktop” computersMost of us think of “desktop” computers
– PC’sPC’s– LaptopsLaptops– MainframesMainframes– ServersServers
But there’s another type of computing But there’s another type of computing systemsystem– Far more common...Far more common...
99
Embedded systems Embedded systems overviewoverview
Embedded computing systems– Computing systems embedded
within electronic devices– Hard to define. Nearly any
computing system other than a desktop computer
– Billions of units produced yearly, versus millions of desktop units
– Perhaps 50 per household and per automobile
Computers are in here...
and here...
and even here...
Lots more of these, though they cost a lot
less each.
1010
A “short list” of embedded systemsA “short list” of embedded systems
And the list goes on and on
Anti-lock brakesAuto-focus camerasAutomatic teller machinesAutomatic toll systemsAutomatic transmissionAvionic systemsBattery chargersCamcordersCell phonesCell-phone base stationsCordless phonesCruise controlCurbside check-in systemsDigital camerasDisk drivesElectronic card readersElectronic instrumentsElectronic toys/gamesFactory controlFax machinesFingerprint identifiersHome security systemsLife-support systemsMedical testing systems
ModemsMPEG decodersNetwork cardsNetwork switches/routersOn-board navigationPagersPhotocopiersPoint-of-sale systemsPortable video gamesPrintersSatellite phonesScannersSmart ovens/dishwashersSpeech recognizersStereo systemsTeleconferencing systemsTelevisionsTemperature controllersTheft tracking systemsTV set-top boxesVCR’s, DVD playersVideo game consolesVideo phonesWashers and dryers
1111
Some common characteristics Some common characteristics of embedded systemsof embedded systems
Single-functionedSingle-functioned– Executes a single program, repeatedlyExecutes a single program, repeatedly
Tightly-constrainedTightly-constrained– Low cost, low power, small, fast, etc.Low cost, low power, small, fast, etc.
Reactive and real-timeReactive and real-time– Continually reacts to changes in the Continually reacts to changes in the
system’s environmentsystem’s environment– Must compute certain results in real-time Must compute certain results in real-time
without delaywithout delay
1212
An embedded system example An embedded system example – – Digital cameraDigital camera
Single-functioned -- always a digital cameraSingle-functioned -- always a digital camera Tightly-constrained -- Low cost, low power, small, fastTightly-constrained -- Low cost, low power, small, fast Reactive and real-time -- only to a small extentReactive and real-time -- only to a small extent
Microcontroller
CCD preprocessor Pixel coprocessorA2D
D2A
JPEG codec
DMA controller
Memory controller ISA bus interface UART LCD ctrl
Display ctrl
Multiplier/Accum
Digital camera chip
lens
CCD
1313
Embedded Software Development Embedded Software Development Requires as Much/More Design Effort Requires as Much/More Design Effort Than HardwareThan Hardware
1414
A System-on-a-Chip: A System-on-a-Chip: ExampleExample
Courtesy: Philips
1515
Design at a crossroadDesign at a crossroad
System-on-a-ChipSystem-on-a-Chip
RAM
500 k Gates FPGA+ 1 Gbit DRAMPreprocessing
Multi-
SpectralImager
Csystem+2 GbitDRAMRecog-nition
Ana
log
64 SIMD ProcessorArray + SRAM
Image Conditioning100 GOPS
Embedded applications Embedded applications where where cost, performance, cost, performance, and energyand energy are the real are the real issues!issues!
DSP and control intensiveDSP and control intensive Mixed-modeMixed-mode Combines programmable Combines programmable
and application-specific and application-specific modulesmodules
Software plays crucial roleSoftware plays crucial role
1616
Disciplines involved in Disciplines involved in Embedded System Embedded System DesignDesign Digital System DesignDigital System Design Software DesignSoftware Design Analog/Mixed-Signal/RF System DesignAnalog/Mixed-Signal/RF System Design Operating SystemsOperating Systems Microprocessors/Computer ArchitectureMicroprocessors/Computer Architecture VerificationVerification TestingTesting etcetc
1717
Languages traditionally Languages traditionally used in Embedded System used in Embedded System DesignDesign
Specification/Specification/modelingmodeling– UMLUML– SDLSDL– C/C++C/C++
Hardware designHardware design– VHDLVHDL– VerilogVerilog
Software designSoftware design– C/C++C/C++– JavaJava– AssemblyAssembly
VerificationVerification– VHDL/VerilogVHDL/Verilog– SystemVerilogSystemVerilog– Tcl/tkTcl/tk– VeraVera
1818
Design challenge – optimizing Design challenge – optimizing design metricsdesign metrics
Obvious design goal:Obvious design goal:– Construct an implementation with desired Construct an implementation with desired
functionalityfunctionality Key design challenge:Key design challenge:
– Simultaneously optimize numerous design Simultaneously optimize numerous design metricsmetrics
Design metricDesign metric– A measurable feature of a system’s A measurable feature of a system’s
implementationimplementation– Optimizing design metrics is a key Optimizing design metrics is a key
challengechallenge
1919
Design challenge – Design challenge – optimizing design optimizing design metricsmetrics Common metricsCommon metrics
– Unit cost: Unit cost: the monetary cost of manufacturing each the monetary cost of manufacturing each copy of the system, excluding NRE costcopy of the system, excluding NRE cost
– NRE cost (Non-Recurring Engineering NRE cost (Non-Recurring Engineering cost): cost): The one-time monetary cost of designing the The one-time monetary cost of designing the systemsystem
– Size: Size: the physical space required by the systemthe physical space required by the system
– Performance: Performance: the execution time or throughput of the execution time or throughput of the systemthe system
– Power: Power: the amount of power consumed by the systemthe amount of power consumed by the system
– Flexibility: Flexibility: the ability to change the functionality of the ability to change the functionality of the system without incurring heavy NRE costthe system without incurring heavy NRE cost
2020
Design challenge – optimizing Design challenge – optimizing design metricsdesign metrics
Common metrics (continued)Common metrics (continued)– Time-to-prototype: Time-to-prototype: the time needed the time needed
to build a working version of the systemto build a working version of the system
– Time-to-market: Time-to-market: the time required to the time required to develop a system to the point that it can be develop a system to the point that it can be released and sold to customersreleased and sold to customers
– Maintainability: Maintainability: the ability to modify the ability to modify the system after its initial releasethe system after its initial release
– Correctness, safety, many moreCorrectness, safety, many more
2121
Design metric competition -- Design metric competition -- improving one may worsen othersimproving one may worsen others
Expertise with both Expertise with both software and hardware is software and hardware is needed to optimize needed to optimize design metricsdesign metrics– Not just a hardware or Not just a hardware or
software expert, as is software expert, as is commoncommon
– A designer must be A designer must be comfortable with comfortable with various technologies various technologies in order to choose the in order to choose the best for a given best for a given application and application and constraintsconstraints
SizePerformance
Power
NRE cost
Microcontroller
CCD preprocessor Pixel coprocessorA2D
D2A
JPEG codec
DMA controller
Memory controller ISA bus interface UART LCD ctrl
Display ctrl
Multiplier/Accum
Digital camera chip
lens
CCD
2222
Time-to-market: a demanding Time-to-market: a demanding design metricdesign metric
Time required to Time required to develop a product to develop a product to the point it can be sold the point it can be sold to customersto customers
Market windowMarket window– Period during which Period during which
the product would the product would have highest saleshave highest sales
Average time-to-market Average time-to-market constraint is about 8 constraint is about 8 monthsmonths
Delays can be costlyDelays can be costly
Revenues ($)
Time (months)
2323
Losses due to delayed market Losses due to delayed market entryentry
Simplified revenue modelSimplified revenue model– Product life = 2W, peak Product life = 2W, peak
at Wat W– Time of market entry Time of market entry
defines a triangle, defines a triangle, representing market representing market penetrationpenetration
– Triangle area equals Triangle area equals revenuerevenue
Loss Loss – The difference between The difference between
the on-time and the on-time and delayed triangle areasdelayed triangle areas
On-time Delayed
entry entry
Peak revenue
Peak revenue from delayed
entry
Market
riseMarket
fall
W 2WTime
D
On-time
Delayed
Reven
ues (
$)
2424
Losses due to delayed market Losses due to delayed market entry (cont.)entry (cont.)
Area = 1/2 * base * heightArea = 1/2 * base * height– On-time = 1/2 * 2W * WOn-time = 1/2 * 2W * W– Delayed = 1/2 * (W-Delayed = 1/2 * (W-
D+W)*(W-D)D+W)*(W-D) Percentage revenue loss = Percentage revenue loss =
(D(3W-D)/2W(D(3W-D)/2W22)*100%)*100% Try some examplesTry some examples
– Lifetime 2W=52 wks, delay D=4 wks
– (4*(3*26 –4)/2*26^2) = 22%– Lifetime 2W=52 wks, delay D=10
wks– (10*(3*26 –10)/2*26^2) = 50%– Delays are costly!
On-time Delayed
entry entry
Peak revenue
Peak revenue from delayed
entry
Market
riseMarket
fall
W 2WTime
D
On-time
Delayed
Reven
ues (
$)
2525
The performance design metricThe performance design metric
Widely-used measure of system, widely-abusedWidely-used measure of system, widely-abused– Clock frequency, instructions per second – not good Clock frequency, instructions per second – not good
measuresmeasures– Digital camera example – a user cares about how fast it Digital camera example – a user cares about how fast it
processes images, not clock speed or instructions per processes images, not clock speed or instructions per secondsecond
Latency (response time)Latency (response time)– Time between task start and endTime between task start and end– e.g., Camera’s A and B process images in 0.25 secondse.g., Camera’s A and B process images in 0.25 seconds
ThroughputThroughput– Tasks per second, e.g. Camera A processes 4 images per Tasks per second, e.g. Camera A processes 4 images per
secondsecond– Throughput can be more than latency seems to imply due Throughput can be more than latency seems to imply due
to concurrency, e.g. Camera B may process 8 images per to concurrency, e.g. Camera B may process 8 images per second (by capturing a new image while previous image is second (by capturing a new image while previous image is being stored).being stored).
SpeedupSpeedup of B over S = B’s performance / A’s performance of B over S = B’s performance / A’s performance– Throughput speedup = 8/4 = 2Throughput speedup = 8/4 = 2
2626
Three key embedded system Three key embedded system technologiestechnologies
TechnologyTechnology– A manner of accomplishing a task, A manner of accomplishing a task,
especially using technical processes, especially using technical processes, methods, or knowledgemethods, or knowledge
Three key technologies for Three key technologies for embedded systemsembedded systems– Processor technologyProcessor technology– IC technologyIC technology– Design technologyDesign technology
2727
Processor technologyProcessor technology
The architecture of the computation engine used to The architecture of the computation engine used to implement a system’s desired functionalityimplement a system’s desired functionality
Processor does not have to be programmableProcessor does not have to be programmable– ““Processor” Processor” notnot equal to general-purpose equal to general-purpose
processorprocessor
Application-specific
Registers
CustomALU
DatapathController
Program memory
Assembly code for:
total = 0 for i =1 to …
Control logic and State register
Datamemory
IR PC
Single-purpose (“hardware”)
DatapathController
Control logic
State register
Datamemory
index
total
+
IR PC
Registerfile
GeneralALU
DatapathController
Program memory
Assembly code for:
total = 0 for i =1 to …
Control logic and
State register
Datamemory
General-purpose (“software”)
2828
Processor technologyProcessor technology
Processors vary in their customization for the Processors vary in their customization for the problem at handproblem at hand
total = 0for i = 1 to N loop total += M[i]end loop
General-purpose processor
Single-purpose processor
Application-specific processor
Desired functionality
2929
General-purpose General-purpose processorsprocessors
Programmable device used in a Programmable device used in a variety of applicationsvariety of applications– Also known as “microprocessor”Also known as “microprocessor”
FeaturesFeatures– Program memoryProgram memory– General datapath with large General datapath with large
register file and general ALUregister file and general ALU User benefitsUser benefits
– Low time-to-market and NRE Low time-to-market and NRE costscosts
– High flexibilityHigh flexibility ““Pentium” the most well-known, Pentium” the most well-known,
but there are hundreds of othersbut there are hundreds of others
Datapath
IR PC
Registerfile
GeneralALU
Controller
Program memory
Assembly code for:
total = 0 for i =1 to …
Control logic and
State register
Datamemory
3030
Single-purpose Single-purpose processorsprocessors
Digital circuit designed to Digital circuit designed to execute exactly one programexecute exactly one program– a.k.a. coprocessor, accelerator or a.k.a. coprocessor, accelerator or
peripheralperipheral FeaturesFeatures
– Contains only the components Contains only the components needed to execute a single needed to execute a single programprogram
– No program memoryNo program memory BenefitsBenefits
– FastFast– Low powerLow power– Small sizeSmall size
DatapathController
Control logic
State register
Datamemory
index
total
+
3131
Application-specific Application-specific processorsprocessors
Programmable processor optimized Programmable processor optimized for a particular class of applications for a particular class of applications having common characteristicshaving common characteristics– Compromise between general-purpose Compromise between general-purpose
and single-purpose processorsand single-purpose processors FeaturesFeatures
– Program memoryProgram memory– Optimized datapathOptimized datapath– Special functional unitsSpecial functional units
BenefitsBenefits– Some flexibility, good performance, size Some flexibility, good performance, size
and powerand power
Datapath
IR PC
Registers
CustomALU
Controller
Program memory
Assembly code for:
total = 0 for i =1 to …
Control logic and
State register
Datamemory
3232
IC technologyIC technology
The manner in which a digital (gate-level) The manner in which a digital (gate-level) implementation is mapped onto an ICimplementation is mapped onto an IC– IC: Integrated circuit, or “chip”IC: Integrated circuit, or “chip”– IC technologies differ in their customization to IC technologies differ in their customization to
a designa design– IC’s consist of numerous layers (perhaps 10 or IC’s consist of numerous layers (perhaps 10 or
more)more) IC technologies differ with respect to who IC technologies differ with respect to who
builds each layer and whenbuilds each layer and when
source drainchannel
oxide
gate
Silicon substrate
IC package IC
3333
IC technology Design IC technology Design ApproachesApproaches
Custom
Standard CellsCompiled Cells
Macro Cells
Cell-based
Pre-diffused(Gate Arrays)
Pre-wired(FPGA's)
Array-based
Semicustom
IC Technology Implementation Approaches
3434
Full-custom designFull-custom design
All layers are optimized for an embedded All layers are optimized for an embedded system’s particular digital implementationsystem’s particular digital implementation– Placing transistorsPlacing transistors– Sizing transistorsSizing transistors– Routing wiresRouting wires
BenefitsBenefits– Excellent performance, small size, low powerExcellent performance, small size, low power
DrawbacksDrawbacks– High NRE cost (e.g., $300k), long time-to-High NRE cost (e.g., $300k), long time-to-
marketmarket
3535
The Custom Approach The Custom Approach
Intel 4004
Courtesy Intel
3636
Transition to Automation and Transition to Automation and Regular StructuresRegular Structures
Intel 4004 (‘71)Intel 4004 (‘71)Intel 8080Intel 8080 Intel 8085Intel 8085
Intel 8286Intel 8286 Intel 8486Intel 8486Courtesy Intel
3737
3838
IC technology Design IC technology Design ApproachesApproaches
Custom
Standard CellsCompiled Cells
Macro Cells
Cell-based
Pre-diffused(Gate Arrays)
Pre-wired(FPGA's)
Array-based
Semicustom
IC Technology Implementation Approaches
3939
Semi-customSemi-custom
Lower layers are fully or partially builtLower layers are fully or partially built– Designers are left with routing of wires Designers are left with routing of wires
and maybe placing some blocksand maybe placing some blocks BenefitsBenefits
– Good performance, good size, less NRE Good performance, good size, less NRE cost than a full-custom implementation cost than a full-custom implementation (perhaps $10k to $100k)(perhaps $10k to $100k)
DrawbacksDrawbacks– Still require weeks to months to developStill require weeks to months to develop
4040
Cell-based Design (or Cell-based Design (or standard cells)standard cells)
Routing channel requirements arereduced by presenceof more interconnectlayersFunctional
module(RAM,multiplier,…)
Routingchannel
Logic cellFeedthrough cell
Row
s o
f ce
lls
4141
Standard Cell — ExampleStandard Cell — Example
[Brodersen92]
4242
Standard Cell - ExampleStandard Cell - Example
3-input NAND cell(from ST Microelectronics):C = Load capacitanceT = input rise/fall time
4343
IC technology Design IC technology Design ApproachesApproaches
Custom
Standard CellsCompiled Cells
Macro Cells
Cell-based
Pre-diffused(Gate Arrays)
Pre-wired(FPGA's)
Array-based
Semicustom
IC Technology Implementation Approaches
4444
Programmable Logic Programmable Logic DevicesDevices
All layers (diffusion, polysilicon, [multi-] metal) may exist– Designers can purchase an IC– Connections on the IC are either created or
destroyed to implement desired functionality– Field-Programmable Gate Array (FPGA) and
recently Gate Arrays are very popular Benefits
– Low NRE costs, almost instant IC availability Drawbacks
– Bigger, expensive (perhaps $30 per unit), power hungry, slower
4545
Gate Array — Sea-of-Gate Array — Sea-of-gatesgates
rows of
cells
routing channel
uncommitted
VD D
GND
polysilicon
metal
possiblecontact
In1 In2 In3 In4
Out
UncommitedCell
CommittedCell(4-input NOR)
4646
Sea-of-gate Primitive Sea-of-gate Primitive CellsCells
NMOS
PMOS
Oxide-isolation
PMOS
NMOS
NMOS
Using oxide-isolation Using gate-isolation
4747
Sea-of-gatesSea-of-gates
Random Logic
MemorySubsystem
LSI Logic LEA300K(0.6 m CMOS)
4848
Prewired ArraysPrewired Arrays
Classification of prewired arrays (or field-programmable devices):Classification of prewired arrays (or field-programmable devices): Based on Programming TechniqueBased on Programming Technique
– Fuse-based (program-once)Fuse-based (program-once)– Non-volatile EPROM basedNon-volatile EPROM based– RAM basedRAM based
Programmable Logic StyleProgrammable Logic Style– Array-BasedArray-Based– Look-up TableLook-up Table
Programmable Interconnect StyleProgrammable Interconnect Style– Channel-routingChannel-routing– Mesh networksMesh networks
4949
Altera MAXAltera MAX
From Smith97
5050
Altera MAX Interconnect Altera MAX Interconnect ArchitectureArchitecture
LAB2
PIA
LAB1
LAB6
tPIA
tPIA
row channelcolumn channel
LAB
Array-based(MAX 3000-7000)
Mesh-based(MAX 9000)
5151
LUT-Based Logic CellLUT-Based Logic Cell
D4
C1....C4
xxxxxx
D3
D2
D1
F4
F3
F2
F1
Logicfunction
ofxxx
Logicfunction
ofxxx
Logicfunction
ofxxx
xx
xx
4
xxxxxx
xxxxxxxx
xxx
xxxx xxxx xxxx
HP
Bitscontrol
Bitscontrol
Multiplexer Controlledby Configuration Program
x
xx
x
xx
xxx xx
xxxx
x
xxxxxx
xx
x
xx
xxx
xx
Xilinx 4000 Series
5252
Array-Based Array-Based Programmable WiringProgrammable Wiring
Vertical tracks
Input/output pinProgrammed interconnection
InterconnectPoint
Horizontaltracks
Cell
M
5353
Transistor Transistor Implementation of MeshImplementation of Mesh
Courtesy Dehon and Wawrzyniek
5454
RAM-based FPGA RAM-based FPGA
Xilinx XC4000ex
5555
Design TechnologyDesign Technology
The manner in which we convert our concept of The manner in which we convert our concept of desired system functionality into an implementationdesired system functionality into an implementation
Libraries/IP: Incorporates pre-designed implementation from lower abstraction level into higher level.
Systemspecification
Behavioralspecification
RTspecification
Logicspecification
To final implementation
Compilation/Synthesis: Automates exploration and insertion of implementation details for lower level.
Test/Verification: Ensures correct functionality at each level, thus reducing costly iterations between levels.
Compilation/Synthesis
Libraries/IP
Test/Verification
Systemsynthesis
Behaviorsynthesis
RTsynthesis
Logicsynthesis
Hw/Sw/OS
Cores
RTcomponents
Gates/Cells
Model simulat./checkers
Hw-Swcosimulators
HDL simulators
Gate simulators
5656
The co-design ladderThe co-design ladder In the past:In the past:
– Hardware and Hardware and software design software design technologies were technologies were very differentvery different
– Recent maturation Recent maturation of synthesis enables of synthesis enables a unified view of a unified view of hardware and hardware and softwaresoftware
Hardware/software Hardware/software “codesign”“codesign”
Implementation
Assembly instructions
Machine instructions
Register transfers
Compilers(1960's,1970's)
Assemblers, linkers(1950's, 1960's)
Behavioral synthesis(1990's)
RT synthesis(1980's, 1990's)
Logic synthesis(1970's, 1980's)
Microprocessor plus program bits: “software”
VLSI, ASIC, or PLD implementation: “hardware”
Logic gates
Logic equations / FSM's
Sequential program code (e.g., C, VHDL)
The choice of hardware versus software for a particular function is simply a tradeoff among various design metrics, like performance, power, size, NRE cost, and especially flexibility; there is no
fundamental difference between what hardware or software can implement.
5757
Independence of processor and Independence of processor and IC technologiesIC technologies
Basic tradeoffBasic tradeoff– General vs. customGeneral vs. custom– With respect to processor technology or IC With respect to processor technology or IC
technologytechnology– The two technologies are independentThe two technologies are independent
General-purpose
processor
ASIPSingle-purpose
processor
Semi-customPLD Full-custom
General,providing improved:
Customized, providing improved:
Power efficiencyPerformance
SizeCost (high volume)
FlexibilityMaintainability
NRE costTime- to-prototype
Time-to-marketCost (low volume)
5858
Design Decision Trade-offs
5959
Generalised Design Flow
6060
Architecture ReUseArchitecture ReUse
Silicon System PlatformSilicon System Platform– Flexible architecture for hardware and softwareFlexible architecture for hardware and software– Specific (programmable) componentsSpecific (programmable) components– Network architectureNetwork architecture– Software modulesSoftware modules– Rules and guidelines for design of HW and SWRules and guidelines for design of HW and SW
Has been successful in PC’sHas been successful in PC’s– Dominance of a few players who specify and control architectureDominance of a few players who specify and control architecture
Application-domain specificApplication-domain specific (difference in constraints) (difference in constraints)– Speed (compute power)Speed (compute power)– DissipationDissipation– CostsCosts– Real / non-real time dataReal / non-real time data
6161
Platform-Based DesignPlatform-Based Design
A platform is a A platform is a restriction on the space of possible restriction on the space of possible implementation choicesimplementation choices, providing a well-defined abstraction of , providing a well-defined abstraction of the underlying technology for the application developerthe underlying technology for the application developer
New platforms will be defined at the New platforms will be defined at the architecture-micro-architecture-micro-architecture boundaryarchitecture boundary
They will be They will be component-basedcomponent-based, and will provide a range of , and will provide a range of choices from structured-custom to fully programmable choices from structured-custom to fully programmable implementationsimplementations
Key to such approaches is the Key to such approaches is the representation of representation of communicationcommunication in the platform model in the platform model
““Only the consumer gets freedom of choice;Only the consumer gets freedom of choice;designers need freedomdesigners need freedom fromfrom choice”choice”
(Orfali, et al, 1996, p.522)(Orfali, et al, 1996, p.522)
Source:R.Newton
6262
Platform-based Design – System-on-Chip
Use of predefined Intellectual Property (IP)
A platform-based system consists of a RISC processor, memories, busses and a common language
Platform-based design poses the problem of partitioning a solution between hardware (HDL) and software (programming processors)
6363
Platforms Enable Simplified Platforms Enable Simplified SoC DesignSoC Design
Customer demands– Fast turn-around time– Easy access to pre-qualified building
blocks– Web enabled
Design technology– Core platforms– ‘Big’ IP– Emerging SoC bus standards– Embedded software– HW/SW co-verification
Far Peripherals
Near Peripherals
Core
6464
And Automation of IP Selection & Integration
6565
Heterogeneous Heterogeneous Programmable PlatformsProgrammable Platforms
Xilinx Vertex-II Pro
High-speed I/O
Embedded PowerPcEmbedded memories
Hardwired multipliers
FPGA Fabric
6666
Xilinx’s productsXilinx’s products
6767
Xilinx’s productsXilinx’s products
6868
Comparison of CMOS design Comparison of CMOS design methodsmethods
Design Method
NRE Unit Cost Power Dissipation
Complexity of Implementation
Time-to-Market
Performance
Flexibility
μProcessor/DSP
low medium high low low low high
PLA low medium medium low low medium low
FPGA low high medium medium medium medium medium
Gate/Array
medium medium low medium medium medium medium
Cell Based high low low high high high low
Custom Design
high low low high high Very high low
Platform Based
high Low/medium
low high Medium/low
high medium
6969
Impact of Implementation Impact of Implementation ChoicesChoices
En
erg
y E
fficie
ncy (
in M
OP
S/m
W)
Flexibility(or application scope)
0.1-1
1-10
10-100
100-1000
None Fullyflexible
Somewhatflexible
Hard
wir
ed
cu
sto
m
Con
fig
ura
ble
/Para
mete
rizab
le
Dom
ain
-sp
ecifi
c p
rocessor
(e.g
. D
SP
)
Em
bed
ded
mic
rop
rocessor
7070
Design Economics (1)Design Economics (1)
The selling price of an IC Stotal=Ctotal/(1-m), Ctotal is manufacturing cost for a single IC, m desired profit margin
Costs for produce an IC– Non-recurring engineering costs (NREs)– Recurring engineering costs– Fixed costs
7171
Design Economics (2)Design Economics (2)
Non-recurring engineering costs (NREs)– Engineering design cost– Prototype manufacturing cost
Recurring costs– Process– Package– Test
7272
NRE and unit cost NRE and unit cost metricsmetrics Costs:Costs:
– Unit cost: the monetary cost of manufacturing each copy of Unit cost: the monetary cost of manufacturing each copy of the system, excluding NRE costthe system, excluding NRE cost
– NRE cost (Non-Recurring Engineering cost): The one-time NRE cost (Non-Recurring Engineering cost): The one-time monetary cost of designing the systemmonetary cost of designing the system
– total cost = NRE cost + unit cost * # of unitstotal cost = NRE cost + unit cost * # of units– per-product cost per-product cost = total cost / # of units = total cost / # of units
= (NRE cost / # of units) + unit cost= (NRE cost / # of units) + unit cost
• Example– NRE=$2000, unit=$100– For 10 units
– total cost = $2000 + 10*$100 = $3000– per-product cost = $2000/10 + $100 = $300
Amortizing NRE cost over the units results in an additional $200 per unit
7373
NRE and unit cost NRE and unit cost metricsmetrics
$0
$40,000
$80,000
$120,000
$160,000
$200,000
0 800 1600 2400
A
B
C
$0
$40
$80
$120
$160
$200
0 800 1600 2400
Number of units (volume)
A
B
C
Number of units (volume)
tota
l co
st (
x100
0)
pe
r p
rod
uc
t c
ost
Compare technologies by costs -- best depends on Compare technologies by costs -- best depends on quantityquantity– Technology A: NRE=$2,000, unit=$100Technology A: NRE=$2,000, unit=$100– Technology B: NRE=$30,000, unit=$30Technology B: NRE=$30,000, unit=$30– Technology C: NRE=$100,000, unit=$2Technology C: NRE=$100,000, unit=$2
• But, must also consider time-to-market
7474
Wafer and die costWafer and die cost
Die yield: number of good dies/total number of dies
7575
ExampleExample
Assuming:Assuming:– 20 engineers are employed full-time for a year 20 engineers are employed full-time for a year
with a $50,000/year average salarywith a $50,000/year average salary– Additional 200,000 overhead costs of which Additional 200,000 overhead costs of which
100,000 for total testing100,000 for total testing– A wafer cost of $200 per waferA wafer cost of $200 per wafer– A $2 packaging cost per chipA $2 packaging cost per chip– 10 dies/wafer10 dies/wafer– 70% die yield70% die yield– 98% final test yield98% final test yield– A market for 100,000 itemsA market for 100,000 items
Calculate the minimum shelf price of the Calculate the minimum shelf price of the chipchip
7676
Design productivity exponential increase
Exponential increase over the past few Exponential increase over the past few decadesdecades
100,000
10,0001,000100
101
0.1
0.01
19831981 1987 1989 1991 19931985 1995 1997 1999 2001 2003 2005 2007 2009
Productivity(K) Trans./Staff – Mo.
7777
The growing design-productivity gap
Design Productivity Crisis (SRC 1997) Potential Design Complexity and Designer Productivity
20012003
20052007
20092011
20132015
10,000
1,000
100
Den
sity
(K
gat
es / m
m2)
AS
IC c
lock
(M
Hz)
Gates Clock
Moore’s Law: Standard cell density and speed
Lo
gic
Tra
nsi
sto
r p
er C
hip
( M
)
Pro
du
ctivity ( K
) Tran
s./Staff – M
o.
19811983
19851987
19891991
19931995
19971999
20012003
20052007
2009
100,000,000
0.01
0.1
1
10
100
1,000
10,000
Equivalent Added Complexity
1,000
100
10
1
0.1
0.01
0.001
10,000
21% / yr compounded
Productivity Growth Rate
xxx
xxx
x x
58% / yr c
ompounded
Complexity Growth Rate
costt developmen chip
)costunit chipASP (chip*volume
Investment
Return ROI
Logic Tr. / Chip
Tr. / S.M.
7878
Design productivity Design productivity gapgap 1981 leading edge chip required 100 designer months
– 10,000 transistors / 100 transistors/month 2002 leading edge chip requires 30,000 designer months
– 150,000,000 / 5000 transistors/month Designer cost increase from $1M to $300M
While designer productivity has grown at an impressive rate over the past decades, the rate of improvement has not kept pace with chip capacity
7979
The mythical man-The mythical man-monthmonth
The situation is even worse than the productivity gap indicatesThe situation is even worse than the productivity gap indicates In theory, adding designers to team reduces project completion timeIn theory, adding designers to team reduces project completion time In reality, productivity per designer decreases due to complexities of team In reality, productivity per designer decreases due to complexities of team
management and communication management and communication In the software community, known as “the mythical man-month” (Brooks In the software community, known as “the mythical man-month” (Brooks
1975)1975) At some point, can actually lengthen project completion time! (“Too many At some point, can actually lengthen project completion time! (“Too many
cooks”)cooks”)
1M transistors, 1 designer=5000 trans/month
Each additional designer reduces for 100 trans/month
So 2 designers produce 4900 trans/month each
10000
20000
30000
40000
50000
60000
10 20 30 400
43
24
19
1615
1618
23
Team
Individual
Months until completion
Number of designers
8080
SummarySummary
Embedded systems are everywhereEmbedded systems are everywhere Key challenge: optimization of design metricsKey challenge: optimization of design metrics
– Design metrics compete with one anotherDesign metrics compete with one another A unified view of hardware and software is A unified view of hardware and software is
necessary to improve productivitynecessary to improve productivity Three key technologiesThree key technologies
– Processor: general-purpose, application-specific, Processor: general-purpose, application-specific, single-purposesingle-purpose
– IC: Full-custom, semi-custom, PLDIC: Full-custom, semi-custom, PLD– Design: Compilation/synthesis, libraries/IP, Design: Compilation/synthesis, libraries/IP,
test/verificationtest/verification
8181
Real-time and Real-time and distributed systemsdistributed systems
Dr. Konstantinos TatasDr. Konstantinos Tatas
8282
What is real-time? Is What is real-time? Is there any other kind?there any other kind?
A real-time computer system is a A real-time computer system is a computer system where the correctness computer system where the correctness of the system behavior depends not only of the system behavior depends not only on the logical results of the on the logical results of the computations, but also on the physical computations, but also on the physical time when these results are produced. time when these results are produced.
By system behavior we mean the By system behavior we mean the sequence of outputs in time of a system.sequence of outputs in time of a system.
8383
Real-time means Real-time means reactivereactive A real-time computer system must react to stimuli A real-time computer system must react to stimuli
from its environment from its environment The instant when a result must be produced is The instant when a result must be produced is
called a deadline.called a deadline. If a result has utility even after the deadline has If a result has utility even after the deadline has
passed, the deadline is classified as soft, passed, the deadline is classified as soft, otherwise it is firm. otherwise it is firm.
If severe consequences could result if a firm If severe consequences could result if a firm deadline is missed, the deadline is called hard.deadline is missed, the deadline is called hard.
Example: Consider a traffic signal at a road before Example: Consider a traffic signal at a road before a railway crossing. If the traffic signal does not a railway crossing. If the traffic signal does not change to red before the train arrives, an accident change to red before the train arrives, an accident could result.could result.
8484
ReliabilityReliability
The Reliability R(t) of a system is the probability that a system will provide the specified service until time t, given that the system was operational at the beginning (t-t0)
The probability that a system will fail in a given interval of time is expressed by the failure rate, measured in FITs (Failure In Time).
A failure rate of 1 FIT means that the mean time to a failure (MTTF) of a device is 10^9 h, i.e., one failure occurs in about 115,000 years.
If a system has a constant failure rate of λ failures/h, then the reliability at time t is given by
R(t)= exp(-λ(t-to)) MTTF = 1/λ
8585
ExampleExample
What must be the system failure What must be the system failure rate so that 99% of the systems rate so that 99% of the systems in the field work reliably for the in the field work reliably for the first 100,000 hours?first 100,000 hours?
8686
SafetySafety
8787
MaintainabilityMaintainability
8888
Name some hard, firm Name some hard, firm and soft deadline and soft deadline embedded systemsembedded systems
8989
ExampleExample
an automotive company produces 2,000,000 electronic an automotive company produces 2,000,000 electronic engine controllers of a special type. engine controllers of a special type.
The following design alternatives are discussedThe following design alternatives are discussed (a) Construct the engine control unit as a single SRU with (a) Construct the engine control unit as a single SRU with
the application software in Read Only Memory (ROM).The the application software in Read Only Memory (ROM).The production cost of such a unit is $250. In case of an error, production cost of such a unit is $250. In case of an error, the complete unit has to be replaced.the complete unit has to be replaced.
(b) Construct the engine control unit such that the software (b) Construct the engine control unit such that the software is contained in a ROM that is placed on a socket and can be is contained in a ROM that is placed on a socket and can be replaced in case of a software error. The production cost of replaced in case of a software error. The production cost of the unit without the ROM is $248. The cost of the ROM is $5.the unit without the ROM is $248. The cost of the ROM is $5.
(c) Construct the engine control unit as a single SRU where (c) Construct the engine control unit as a single SRU where the software is loaded in a Flash EPROM that can be the software is loaded in a Flash EPROM that can be reloaded. The production cost of such a unit is $255.reloaded. The production cost of such a unit is $255.
The labor cost of repair is assumed to be $50 for each The labor cost of repair is assumed to be $50 for each vehicle. (It is assumed to be the same for each one of the vehicle. (It is assumed to be the same for each one of the three alternatives). three alternatives).
Calculate the cost of a software error for each one of the Calculate the cost of a software error for each one of the three alternative designs if 300,000 cars have to be recalled three alternative designs if 300,000 cars have to be recalled because of the software error (example in Sect. 1.6.1).because of the software error (example in Sect. 1.6.1).
Which one is the lowest cost alternative if only 1,000 cars Which one is the lowest cost alternative if only 1,000 cars are affected by a recall?are affected by a recall?
9090
Distributed RT system Distributed RT system modelmodel From the POV of an outside observer, a real-From the POV of an outside observer, a real-
time (RT) system can be decomposed into time (RT) system can be decomposed into three communicating subsystems: three communicating subsystems: – a controlled object (the physical subsystem, the a controlled object (the physical subsystem, the
behavior of which is governed by the laws of physics),behavior of which is governed by the laws of physics),– a “distributed” computer subsystem (the cyber a “distributed” computer subsystem (the cyber
system, the behavior of which is governed by the system, the behavior of which is governed by the programs that are executed on digital computers) programs that are executed on digital computers)
– a human user or operator a human user or operator The distributed computer system consists of The distributed computer system consists of
computational nodes that interact by the computational nodes that interact by the exchange of messages. exchange of messages.
A computational node can host one or more A computational node can host one or more computational components.computational components.
9191
Event-Triggered Control Event-Triggered Control Versus Time-Triggered Versus Time-Triggered ControlControl
9292