Embedded Computer Architecture - Uppsala University · 8-bit Intel 8051, standard microcontroller...

21
Jakob Engblom, PhD Jakob Engblom, PhD Uppsala Uppsala Unive Unive rsity rsity & Virtutech Inc. & Virtutech Inc. [email protected] [email protected] [email protected] [email protected] Embedded Embedded Systems Systems Computer Computer Architecture Architecture tech virtutech virtu tech virtu tech virtu 14 Nov 2003 Embedded Computer Architecture 2 Embedded Embedded Systems Systems 14 Nov 2003 Embedded Computer Architecture 3 Embedded Systems Embedded Systems It is a It is a snake snake ! ! No, a No, a wall wall ! ! No, a No, a pillar! pillar! No, it is a No, it is a treetrunk treetrunk ! ! You You re re all all wrong wrong , it is a , it is a fan! fan! Now what Now what is this is this elephant thing elephant thing ? ? 14 Nov 2003 Embedded Computer Architecture 4 Embedded Systems Embedded Systems A computer that doesn A computer that doesn t t look like a computer look like a computer Interacts with world Interacts with world Primitive or no user interface Primitive or no user interface Part of other products Part of other products

Transcript of Embedded Computer Architecture - Uppsala University · 8-bit Intel 8051, standard microcontroller...

Page 1: Embedded Computer Architecture - Uppsala University · 8-bit Intel 8051, standard microcontroller Behavior, ... GSM Radio Code memory ... 14 Nov 2003 Embedded Computer Architecture

Jakob Engblom, PhDJakob Engblom, PhDUppsala Uppsala UniveUniversityrsity & Virtutech Inc.& Virtutech Inc.

[email protected]@[email protected]@virtutech.com

EmbeddedEmbedded Systems Systems ComputerComputer

ArchitectureArchitecture

techvirtutechvirtutechvirtutechvirtu14 Nov 2003 Embedded Computer Architecture 2

Embedded Embedded SystemsSystems

14 Nov 2003 Embedded Computer Architecture 3

Embedded SystemsEmbedded Systems

It is a It is a snakesnake!!

No, a No, a wallwall!!

No, a No, a pillar!pillar!

No, it is a No, it is a treetrunktreetrunk!!

YouYou’’re re all all wrongwrong, it is a , it is a

fan!fan!

Now what Now what is this is this elephant thingelephant thing??

14 Nov 2003 Embedded Computer Architecture 4

Embedded SystemsEmbedded Systems

““A computer that doesnA computer that doesn’’t t look like a computerlook like a computer””Interacts with worldInteracts with worldPrimitive or no user interfacePrimitive or no user interfacePart of other productsPart of other products

Page 2: Embedded Computer Architecture - Uppsala University · 8-bit Intel 8051, standard microcontroller Behavior, ... GSM Radio Code memory ... 14 Nov 2003 Embedded Computer Architecture

14 Nov 2003 Embedded Computer Architecture 5

Embedded SystemsEmbedded Systems

Single purpose productsSingle purpose productsNot Not general purposegeneral purpose like desktop PCslike desktop PCsDo one thing very efficientlyDo one thing very efficiently

Software very important:Software very important:Gives character to productGives character to product

Used to differentiate inside a Used to differentiate inside a ““platformplatform””Can be changed lateCan be changed lateProcessor cheaper than special HWProcessor cheaper than special HWTToday, dominates dev costoday, dominates dev cost

14 Nov 2003 Embedded Computer Architecture 6

"Desktop"2%

"Embedded"98%

Processor MarketProcessor Market

Embedded Embedded = most= most processors!processors!200 million PC and server200 million PC and server8000 million embedded8000 million embedded

14 Nov 2003 Embedded Computer Architecture 7

Processor MarketProcessor Market

Processors: Processors: 50% of all 50% of all semiconductor revenuesemiconductor revenueExplains why everyone Explains why everyone wants to do processorswants to do processors

3232--bit dominantbit dominant30% of total 30% of total semiconductorssemiconductors

PC processors: PC processors: 50% of CPU revenue50% of CPU revenue15% of total 15% of total semiconductorssemiconductorsAMD and Intel share itAMD and Intel share it

32-bit16-bit

8-bit

4-bit

DSP

32-bit

16-bit8-bit4-bitDSP

0%10%20%30%40%50%60%70%80%90%

100%

Units Money

14 Nov 2003 Embedded Computer Architecture 8

RealReal--Time SystemTime System

Timing as important as resultTiming as important as resultHard realHard real--time:time:

Hard deadlinesHard deadlinesDead if missed deadlineDead if missed deadlineWorstWorst--casecase

Soft realSoft real--time:time:Fuzzier deadlinesFuzzier deadlinesCan miss some deadlinesCan miss some deadlinesAverageAverage--casecase

Page 3: Embedded Computer Architecture - Uppsala University · 8-bit Intel 8051, standard microcontroller Behavior, ... GSM Radio Code memory ... 14 Nov 2003 Embedded Computer Architecture

14 Nov 2003 Embedded Computer Architecture 9

RealReal--Time SystemsTime Systems

Embedded and RealEmbedded and Real--TimeTimeSynonymous?Synonymous?

Most embedded Most embedded systems are systems are realreal--timetimeMost realMost real--time time systems are systems are embeddedembedded

embeddedembedded

realreal--timetime

embedded embedded realreal--timetime

14 Nov 2003 Embedded Computer Architecture 10

Simple Embedded Simple Embedded SystemsSystems

8-bit Hitachi H8/30032 kB ROM, 32 kB RAM

Standard microcontroller chip

Byte-code machine, sensor drivers, …

8-bit Intel 8051, standard microcontroller

Behavior, talk, IR communications

14 Nov 2003 Embedded Computer Architecture 11

Fun App: Smart Beer GlassFun App: Smart Beer Glass

88--bbit, 8it, 8--pin pin PIC processorPIC processor

Capacitive Capacitive senssensor for or for fluid levelfluid level

InduInductive coil for ctive coil for RF ID activation RF ID activation

& power& power

CPU and reading coil in the table. Reports the level of fluid in the glass, alerts servers when close to empty

ContContactless actless transmission of transmission of

power and power and readingsreadings

14 Nov 2003 Embedded Computer Architecture 12

No Upgrades PossibleNo Upgrades Possible

Once a product shipsOnce a product ships…………it often cannot be servicedit often cannot be serviced

No download abilityNo download abilityNo writable persistent storageNo writable persistent storageNo disksNo disksNo loaderNo loader

Software is writeSoftware is write--onceonce(There are exceptions)(There are exceptions)

Page 4: Embedded Computer Architecture - Uppsala University · 8-bit Intel 8051, standard microcontroller Behavior, ... GSM Radio Code memory ... 14 Nov 2003 Embedded Computer Architecture

14 Nov 2003 Embedded Computer Architecture 13

Consumer ElectronicsConsumer Electronics

Heterogeneous Heterogeneous multiprocessormultiprocessor

88--bit Atmel AVR for UI, games, bit Atmel AVR for UI, games, ……1616--bit fixedbit fixed--point TI C54 DSP for point TI C54 DSP for GSM coding, radio interface, GSM coding, radio interface, ……3232--bit ARM7 in Bluetooth modulebit ARM7 in Bluetooth module+ maybe ARM7 in IRDA interface+ maybe ARM7 in IRDA interface

All in custom chipsAll in custom chipsSoftware is large:Software is large:

16 MB of code in control part16 MB of code in control partPlus signal processing codePlus signal processing code

14 Nov 2003 Embedded Computer Architecture 14

AutomAutomotiveotive

Multiple networksMultiple networksCAN for body CAN for body electronics: 30+ nodeselectronics: 30+ nodesCAN for engine control: CAN for engine control: few nodesfew nodesLIN for instrumentsLIN for instruments

Many processorsMany processorsUp to 100Up to 100

Large diversity in processor types:Large diversity in processor types:88--bit CPUs (PIC, HC08) for door locks, lights, etc. bit CPUs (PIC, HC08) for door locks, lights, etc. 1616--bit CPUs (C167, HC11, HC12) for most functionsbit CPUs (C167, HC11, HC12) for most functions3232--bit CPUs (PPC,V850) for engine control, airbagsbit CPUs (PPC,V850) for engine control, airbags

Total amount of code: 40Total amount of code: 40--50 MB50 MB

14 Nov 2003 Embedded Computer Architecture 15

AutomotiveAutomotive

Form follows functionForm follows functionProcessing where the action isProcessing where the action isArchitecture given by applicationArchitecture given by applicationSensors and actuators distributedSensors and actuators distributed

Heterogeneous systemsHeterogeneous systemsMany Many different makes of different makes of CPUsCPUsStandardizedStandardized at the at the networknetwork/bus/bus

14 Nov 2003 Embedded Computer Architecture 16

Timing AspectsTiming Aspects

Interrupt latencyInterrupt latencyImportant criterion for embeddedImportant criterion for embeddedA few clock cycles at mostA few clock cycles at mostMeasure of RTOS performanceMeasure of RTOS performance

RealReal--Time = predictabilityTime = predictabilityInIn--order pipelinesorder pipelinesSRAM instead of cachesSRAM instead of cachesLockable cachesLockable cachesSeveral small CPUs instead of one bigSeveral small CPUs instead of one big

Page 5: Embedded Computer Architecture - Uppsala University · 8-bit Intel 8051, standard microcontroller Behavior, ... GSM Radio Code memory ... 14 Nov 2003 Embedded Computer Architecture

14 Nov 2003 Embedded Computer Architecture 17

Military Military ShShipboardipboardStandard multiprocessor UltraSparc servers for radar, target tracking, combat control, …

Many CPUs in missiles, gun controls, engines, …

14 Nov 2003 Embedded Computer Architecture 18

Mobile Phone Base StationMobile Phone Base Station

Handle signalsHandle signalsData streams to and from Data streams to and from phonesphonesMassively parallel systemMassively parallel systemThousands of DSP tasksThousands of DSP tasksPerfect parallel scalabilityPerfect parallel scalability

Custom or standard Custom or standard DSPsDSPsUp to 8 Up to 8 DSPsDSPs on a single chipon a single chip

14 Nov 2003 Embedded Computer Architecture 19

TrendsTrends

Hardware to softwareHardware to softwareIncrease flexibility, lower costIncrease flexibility, lower costSoftware on fast processor can equal HWSoftware on fast processor can equal HW

Software to hardwareSoftware to hardwareBetter power consumption & performanceBetter power consumption & performanceDesign custom hardware for applicationDesign custom hardware for application

HardwareHardware--software software codesigncodesignDelay division HW/SW to late in projectDelay division HW/SW to late in projectObtain Obtain ““optimaloptimal”” HW/SW divisionHW/SW division

14 Nov 2003 Embedded Computer Architecture 20

On-chip bus

SystemSystem--onon--aa--chipchip

Integration Integration extremeextreme

Thanks to modern Thanks to modern semiconductorssemiconductors

Entire product Entire product on a chipon a chipOne or more One or more processors, processors, accelerators, accelerators, ……

DSP

LCD driver

CPU

Blu

etoo

th

GSM Radio

Code memory

Data mem

Page 6: Embedded Computer Architecture - Uppsala University · 8-bit Intel 8051, standard microcontroller Behavior, ... GSM Radio Code memory ... 14 Nov 2003 Embedded Computer Architecture

14 Nov 2003 Embedded Computer Architecture 21

Embedded Embedded ProcProcessingessing

14 Nov 2003 Embedded Computer Architecture 22

MicrocontrollersMicrocontrollers

Classic embedded hardwareClassic embedded hardwareStandard partsStandard parts

Quite broad application domainsQuite broad application domainsSold in large seriesSold in large seriesDefined by hardware vendorsDefined by hardware vendorsAs cheap as a single dollarAs cheap as a single dollar

Single processor + devicesSingle processor + devicesHuge number of variantsHuge number of variantsUsually intended for control planeUsually intended for control plane

Mic

roco

ntro

llers

14 Nov 2003 Embedded Computer Architecture 23

MicroconMicrocontrollertroller

A single chip:A single chip:CPU CoreCPU CoreIntegrated memoryIntegrated memoryIntegrated peripheralsIntegrated peripheralsIntegrated servicesIntegrated services

Goal:Goal:System on one chipSystem on one chipNo external HWNo external HWFit application Fit application ““perfectlyperfectly””

CPUCore

RAM(small)

ROM(big)

UA

RT

A/D

Tim

er

LCD

D

Outside World

14 Nov 2003 Embedded Computer Architecture 24

MicrocontrollerMicrocontroller

CPU CPU BitnessBitness: 4 to 64 bits: 4 to 64 bitsMost common: 8 bit (4G units)Most common: 8 bit (4G units)3232--bit growing fastestbit growing fastest32/6432/64--bit outnumbers desktopbit outnumbers desktop

Frequency: DC to Frequency: DC to 22 GhzGhzMemory onMemory on--chchipip: : 0.5 kB to 5 MB0.5 kB to 5 MBPower: Power: mWmW (and up)(and up)1/30 to 10 instructions per cycle1/30 to 10 instructions per cycle

Page 7: Embedded Computer Architecture - Uppsala University · 8-bit Intel 8051, standard microcontroller Behavior, ... GSM Radio Code memory ... 14 Nov 2003 Embedded Computer Architecture

14 Nov 2003 Embedded Computer Architecture 25

Example: PIC 12CE674Example: PIC 12CE674Memory arch:Memory arch: HarvardHarvardProgram memory:Program memory: 2048 x 14 (OTP/Flash)2048 x 14 (OTP/Flash)EEPROM:EEPROM: 16 bytes16 bytesRAM:RAM: 128 bytes128 bytesADC channels:ADC channels: 4 (8 bits)4 (8 bits)I/O ports:I/O ports: 66Timers:Timers: One 8One 8--bit, One WDTbit, One WDTClock:Clock: onchiponchip crystal, 10MHzcrystal, 10MHzPackage:Package: 8 pins (Pentium 4:8 pins (Pentium 4:700700 pins)pins)Cost:Cost: <<$1.00 (Pentium 4:>$200.00)$1.00 (Pentium 4:>$200.00)

14 Nov 2003 Embedded Computer Architecture 26

Example: AT91M42800AExample: AT91M42800A

ARM7TDMI 32ARM7TDMI 32--bit corebit coreStatic design: 0 to 33 Static design: 0 to 33 MhzMhz

MemoryMemory8 8 kBkB SRAM on chipSRAM on chipExternal memory interface, 8/16 bit interfaceExternal memory interface, 8/16 bit interface

DevicesDevices6 timers6 timers2 serial ports2 serial ports

JTAG debug interfaceJTAG debug interfaceAbout 0.5 W powerAbout 0.5 W powerAbout 18 USDAbout 18 USD

144 Pin package144 Pin packageOne of 13 AT91 One of 13 AT91 variantsvariants

14 Nov 2003 Embedded Computer Architecture 27

Devices on the ChipDevices on the Chip

Interface with the worldInterface with the worldDigital I/ODigital I/OAnalog/Digital conversionAnalog/Digital conversionDigital/Analog conversionDigital/Analog conversion

CommunicationsCommunicationsCAN networksCAN networksEthernet networksEthernet networksRadioRadioSerial ports (UART, USART)Serial ports (UART, USART)USB, FireWire, ... USB, FireWire, ...

14 Nov 2003 Embedded Computer Architecture 28

Devices on the ChipDevices on the Chip

TimersTimersTrigger interruptsTrigger interruptsWatchdogsWatchdogs

GraphicsGraphicsLCD driversLCD drivers2D/3D graphics acceleration2D/3D graphics acceleration

BusesBusesOnOn--chipchip:: between devices: AMBA, between devices: AMBA, ……OffOff--chip: PCI, chip: PCI, HyperTransportHyperTransport, , RapidIORapidIO ……

Page 8: Embedded Computer Architecture - Uppsala University · 8-bit Intel 8051, standard microcontroller Behavior, ... GSM Radio Code memory ... 14 Nov 2003 Embedded Computer Architecture

14 Nov 2003 Embedded Computer Architecture 29

ASIPsASIPs / / ASSPsASSPs

ApplicationApplication--specific specific integrated/standard processorintegrated/standard processor

Targeting a particular niche marketTargeting a particular niche marketMore targeted than microcontrollerMore targeted than microcontrollerDomainDomain--specific acceleratorsspecific accelerators

Usually more upscaleUsually more upscale3232--bit processorsbit processorsMultiprocessors Multiprocessors Expensive peripheralsExpensive peripheralsExternal memory assumedExternal memory assumedHigher performance, includes dataHigher performance, includes data--planeplane A

SIP

/ ASS

P

14 Nov 2003 Embedded Computer Architecture 30

Example: Example: PowerQUICCPowerQUICC IIIIII

MotorolaMotorolaTarget marketTarget market

CommunicationsCommunications

Processing Processing PowerPC e500PowerPC e500666666--1000 1000 MhzMhz256 256 kBkB L2 cacheL2 cache

NetworkingNetworkingCPM module, RISCCPM module, RISC--based microcodebased microcode

About 160 USDAbout 160 USD

Features

Capabilities

256Multichannel HDLC (from MCC2)

2Utopia II ATM (from FCC)

2Ethernet 10/100/1000

3Ethernet, 10/100 (from FCC)

4Ethernet, 10 (from SCC)

2Ethernet 10/100/1000 controller

1RapidIO controller

1PCI-X/PCI controller

11DDR Memory controller

1I2C controller

1Serial Peripheral Interface (SPI)

2Serial Management Controller (SMC)

2Multi-Channel Controller (MCC2)

3Fast Communications Controller (FCC)

4Serial Communications Controller (SCC)

14 Nov 2003 Embedded Computer Architecture 31

Example: C167CSExample: C167CS

InfineonInfineonTarget MarketTarget Market

Automotive controlAutomotive control

ProcessingProcessing1616--bit C16x corebit C16x core44--stage simple pipelinestage simple pipeline40 40 MhzMhz operationoperation16 MB memory space, 16 MB memory space, including ROM, RAM, including ROM, RAM, devicesdevices

144 pin package144 pin packageTolerates Tolerates --40 C to +125 C40 C to +125 C

About 25 USDAbout 25 USD

1Synchronous Serial Comms (SSC)

8 kBExtension Internal RAM (XRAM)

3 kBFast General Internal RAM (IRAM)

Devices

External Ports

32 kBROM

Memory

116-bit ports from devices

88-bit ports from devices

2CAN interfaces

2x16Capture/Compare Channels

1USART

24+8Analog-Digital Converter Channels

1Pulse-Width Modulator (PWM)

1Watch-Dog Timer (WDT)

5General-Purpose Timers (GPT)

2CAN 2.0b controllers

14 Nov 2003 Embedded Computer Architecture 32

Example: Cisco Toaster3Example: Cisco Toaster38 clusters of 2 8 clusters of 2

processors processors eacheach

Each TMC Each TMC is a is a VLIW machine VLIW machine

with 74 bit with 74 bit instructions, 2k instructions, 2k instructions in instructions in local memorylocal memory

Total caTotal capacity: pacity: about 5 GOps, at about 5 GOps, at around 160 Mhzaround 160 Mhz

Two 32Two 32--bit bit ALUs and three ALUs and three

control/data control/data movement units movement units

per TMCper TMC

Image from Microprocessor Report, Oct 2002

Page 9: Embedded Computer Architecture - Uppsala University · 8-bit Intel 8051, standard microcontroller Behavior, ... GSM Radio Code memory ... 14 Nov 2003 Embedded Computer Architecture

14 Nov 2003 Embedded Computer Architecture 33

Example: Cisco Toaster3Example: Cisco Toaster3

Massive Massive multiprocessingmultiprocessing

16 cores on a chip16 cores on a chip4 chips in serial4 chips in serialRouting:Routing:

10 10 GbpsGbps@ 20 @ 20 Mpackets/sMpackets/s1000 ops per packet 1000 ops per packet passing throughpassing through

14 Nov 2003 Embedded Computer Architecture 34

FPGAFPGA

Field Programmable Gate ArrayField Programmable Gate ArrayReconfigurable hardware: Reconfigurable hardware: ““soft logicsoft logic””

““ProgramProgram”” is circuit layoutis circuit layoutCan be changed after Can be changed after iniinitial loadtial load

Kilos to Megs of Kilos to Megs of ””gatesgates”” availableavailable

Competitor to Competitor to ASICsASICsMore expensive per unit, More expensive per unit, but no startbut no start--up cost for manufacturingup cost for manufacturingLess flexible, slightly slowerLess flexible, slightly slowerPerfect for lowPerfect for low--volume productsvolume products FP

GA

14 Nov 2003 Embedded Computer Architecture 35

FPGA ArchitectureFPGA Architecture

Computation cellsComputation cellsProgrammable Programmable functionfunction

Adder, Logic Adder, Logic funcsfuncs, ..., ...Memory, Registers, ... Memory, Registers, ...

Input/Output cellsInput/Output cellsInterconnectInterconnect

ReconfigurableReconfigurableProgrammableProgrammable

14 Nov 2003 Embedded Computer Architecture 36

FPGA ArchitectureFPGA Architecture

Computation cellsComputation cellsLookLook--Up TableUp Table

Arbitrary 4Arbitrary 4--input, input, 11--output functionoutput function

CoarseCoarse--grainedgrainedLots of functionalityLots of functionalitySeveral Several LUTsLUTsPlus flipPlus flip--flops etc.flops etc.

FineFine--grainedgrainedLittle functionalityLittle functionality

ConfigRAM

LUT

Page 10: Embedded Computer Architecture - Uppsala University · 8-bit Intel 8051, standard microcontroller Behavior, ... GSM Radio Code memory ... 14 Nov 2003 Embedded Computer Architecture

14 Nov 2003 Embedded Computer Architecture 37

FPGFPGA with CPU CoresA with CPU Cores

CPU onCPU on--board FPGAboard FPGAHW accelerate critical HW accelerate critical tasks in FPGA tasks in FPGA fabfabricricData pumps in FPGAData pumps in FPGAControl in CPUControl in CPU

Cool new possibilitiesCool new possibilitiesReconfigure FPGA onlineReconfigure FPGA onlineAdapt to workloadsAdapt to workloads

CPU

14 Nov 2003 Embedded Computer Architecture 38

Soft CPUs in FPGAsSoft CPUs in FPGAs

Processor in the FPGA fabricProcessor in the FPGA fabric””SoftSoft”” processorprocessorSpecial design considerationsSpecial design considerations

ExamplesExamplesAltera NiosAltera NiosXilinx MicroblazeXilinx MicroblazeResearch projectsResearch projects

VVäästersteråås ARM clone s ARM clone Leon processor also prototypedLeon processor also prototyped

14 Nov 2003 Embedded Computer Architecture 39

ExamplesExamples

Altera Apex 20kCAltera Apex 20kC““VolumeVolume””3030k to 1.5M gatesk to 1.5M gates

XilinxXilinx VirtexVirtex IIII: : ““HighHigh--endend””11--4 PPC405 cores 4 PPC405 cores (optional)(optional)10M gates10M gatesPrice at about $1000Price at about $1000

AlteraAltera StratixStratix““AdvancedAdvanced””10 10 MbitMbit RAMRAM28 DSP elements28 DSP elements100000 LE100000 LE1300 user I/O pins1300 user I/O pinsOptimized for Optimized for NiosNios

ATMEL FPSLIC: ATMEL FPSLIC: ““LowLow--endend””AVR 8AVR 8--bit CPUbit CPU5050kk gatesgates

14 Nov 2003 Embedded Computer Architecture 40

CCase Study: ase Study: ARMARM

1026EJ1026EJ--SS

Page 11: Embedded Computer Architecture - Uppsala University · 8-bit Intel 8051, standard microcontroller Behavior, ... GSM Radio Code memory ... 14 Nov 2003 Embedded Computer Architecture

14 Nov 2003 Embedded Computer Architecture 41

OverviewOverview

14 Nov 2003 Embedded Computer Architecture 42

The Basics:The Basics: ARM1026EJARM1026EJ--SS

Not a standNot a stand--alone processoralone processorFor integration in your own chipsFor integration in your own chipsProcessor package:Processor package:

CPU core CPU core CCachesaches, configurable in size, configurable in sizeTightlyTightly--coupled memories, configurable coupled memories, configurable in sizein sizeBus interfaceBus interfaceMMU (supports WinCE, Symbian, etc.)MMU (supports WinCE, Symbian, etc.)

14 Nov 2003 Embedded Computer Architecture 43

Business ModelBusiness Model

Sold as an Sold as an IP CoreIP CoreIP = IP = ““Intellectual PropertyIntellectual Property””Not a physical chip, just a designNot a physical chip, just a design””Source code componentSource code component””Similar in scope to classic processorSimilar in scope to classic processor

For integration in For integration in ASICASICssASIC = ApplicationASIC = Application--specific specific integrated circuitintegrated circuit

14 Nov 2003 Embedded Computer Architecture 44

ASICsASICs

Fully custom chipsFully custom chipsCustom for your applicationCustom for your applicationAs small or large as necessaryAs small or large as necessary

CharacteristicsCharacteristicsExpensive to developExpensive to develop

10s of engineers, often 100s10s of engineers, often 100sLarge series necessary to pay offLarge series necessary to pay off

At least 100 000 units necessary on averageAt least 100 000 units necessary on averageMostly for large companiesMostly for large companies

To streamline: build from To streamline: build from IP blocksIP blocks

Page 12: Embedded Computer Architecture - Uppsala University · 8-bit Intel 8051, standard microcontroller Behavior, ... GSM Radio Code memory ... 14 Nov 2003 Embedded Computer Architecture

14 Nov 2003 Embedded Computer Architecture 45

IP BlocksIP Blocks

IPIPHardware componentsHardware componentsIntegrated on chip by Integrated on chip by customercustomer

Examples:Examples:CPU CoresCPU CoresMemoryMemoryBusesBusesNetwork interfacesNetwork interfacesAccelerator circuitsAccelerator circuits

On-chip bus

DSP

LCD driver

CPU

Blu

etoo

th

GSM Radio

Code memory

Data mem

14 Nov 2003 Embedded Computer Architecture 46

CPU CoresCPU Cores

The biggest The biggest ““IPIP”” businessbusiness““FablessFabless”” chchipip companiescompaniesBiggest players:Biggest players:

ARM (bestARM (best--selling 32selling 32--bit bit architecturearchitecture))MIPS (and its licensees)MIPS (and its licensees)

Crowded fieldCrowded fieldNew companies appear monthlyNew companies appear monthlyNiched components can find a marketNiched components can find a market

14 Nov 2003 Embedded Computer Architecture 47

Component StylesComponent Styles

Hard IP:Hard IP:Tied to a particular fab processTied to a particular fab process

Like IBM 0.13u Cu, TSMC 0.18, etc.Like IBM 0.13u Cu, TSMC 0.18, etc.Black box to customerBlack box to customer

Synthesizable IP:Synthesizable IP:Source code for compilation by customerSource code for compilation by customerOffers configuration options like cache sizes, TCMsOffers configuration options like cache sizes, TCMsMIPS 24k, ARM 9S, 1026S, 1136SMIPS 24k, ARM 9S, 1026S, 1136S

Soft IP:Soft IP:Get full source code for the componentGet full source code for the componentPurpose is to customize heavilyPurpose is to customize heavilyARCARC ARCtangent 5, ARCtangent 5, TenTensilica Xtensa Vsilica Xtensa V

14 Nov 2003 Embedded Computer Architecture 48

Synthesizable Vs Hard IPSynthesizable Vs Hard IP

SynthesizableSynthesizable++ Use any processUse any process++ Use any fabUse any fab++ Customize detailsCustomize details++ Customize chipsCustomize chips++ Add instructionsAdd instructions-- Slower memoriesSlower memories-- Higher powerHigher power-- Lower Lower

performanceperformance

Hard IPHard IP++ Optimized layoutOptimized layout++ Small areaSmall area++ Low powerLow power++ Best performanceBest performance-- No flexibilityNo flexibility

ForFor best results, best results, cores need to be cores need to be redesigned to be redesigned to be

synthesizablesynthesizable

Page 13: Embedded Computer Architecture - Uppsala University · 8-bit Intel 8051, standard microcontroller Behavior, ... GSM Radio Code memory ... 14 Nov 2003 Embedded Computer Architecture

14 Nov 2003 Embedded Computer Architecture 49

1026EJ1026EJ--S CoreS Core

66--stage pipeline:stage pipeline:Max clock, best case: 475 MhzMax clock, best case: 475 Mhz

Depends on process, synthesis usedDepends on process, synthesis usedOptimized for synthesis of coreOptimized for synthesis of coreIntegerInteger--onlyonly

Power:Power:Depends on process & configurationDepends on process & configurationQuoted numbers: 0.5mW/Mhz Quoted numbers: 0.5mW/Mhz

With 16kB+16kB L1 cachesWith 16kB+16kB L1 caches130 nm process at TSMC130 nm process at TSMC(Pen(Pentium tium 4: >35 4: >35 mW/MhzmW/Mhz))

14 Nov 2003 Embedded Computer Architecture 50

ARM1026EJARM1026EJ--S PipelineS Pipeline

Fetch Issue Decode

Shift/ALU Sat

Write

MAC1 MAC2

LS1 LS2 LS write

Static branch Static branch preprediction (75% diction (75% accurate): uses accurate): uses less power than less power than

dynamicdynamic

RetuReturn stack rn stack (single entry). (single entry).

Simple but Simple but effectiveeffective

14 Nov 2003 Embedded Computer Architecture 51

ARM1026EJARM1026EJ--S PipelineS Pipeline

Fetch Issue Decode

Shift/ALU Sat

Write

MAC1 MAC2

LS1 LS2 LS write

ARARM/Thumb/Java M/Thumb/Java decodedecode

AAccess to ccess to coprocessorscoprocessors

14 Nov 2003 Embedded Computer Architecture 52

ARM1026EJARM1026EJ--S PipelineS Pipeline

Fetch Issue Decode

Shift/ALU Sat

Write

MAC1 MAC2

LS1 LS2 LS write

Register read, Register read, initialize memory initialize memory

accessesaccesses

Evaluate Evaluate immediatesimmediates

Page 14: Embedded Computer Architecture - Uppsala University · 8-bit Intel 8051, standard microcontroller Behavior, ... GSM Radio Code memory ... 14 Nov 2003 Embedded Computer Architecture

14 Nov 2003 Embedded Computer Architecture 53

ARM1026EJARM1026EJ--S PipelineS Pipeline

Fetch Issue Decode

Shift/ALU Sat

Write

MAC1 MAC2

LS1 LS2 LS write

ExExecution pipeline ecution pipeline for most integer for most integer

instructionsinstructions

Handle Handle saturated saturated arithmearithmetictic

14 Nov 2003 Embedded Computer Architecture 54

ARM1026EJARM1026EJ--S PipelineS Pipeline

Fetch Issue Decode

Shift/ALU Sat

Write

MAC1 MAC2

LS1 LS2 LS write

Execution pipeline Execution pipeline for for multiplymultiply--accumulate accumulate instructionsinstructions

14 Nov 2003 Embedded Computer Architecture 55

ARM1026EJARM1026EJ--S PipelineS Pipeline

Fetch Issue Decode

Shift/ALU Sat

Write

MAC1 MAC2

LS1 LS2 LS write

DDecoupled pipeline ecoupled pipeline for loads and storesfor loads and stores

2 stage memory 2 stage memory access to support access to support slow synthesized slow synthesized

memorymemory

14 Nov 2003 Embedded Computer Architecture 56

Rounding OutRounding Out

Configurable cachesConfigurable cachesTypically 16kB/16kBTypically 16kB/16kB

Optional Optional TCMsTCMsMemory interfaceMemory interface

2 x 64 bit AMBA AHB links2 x 64 bit AMBA AHB linksOptional vector FP coprocessorOptional vector FP coprocessorOptional vector interrupt Optional vector interrupt controllercontroller

Page 15: Embedded Computer Architecture - Uppsala University · 8-bit Intel 8051, standard microcontroller Behavior, ... GSM Radio Code memory ... 14 Nov 2003 Embedded Computer Architecture

14 Nov 2003 Embedded Computer Architecture 57

ARM1026EJARM1026EJ--S SystemS System

ARM1026EJ-SCore

I$ D$

I-TCM

VFP10 FP coprocessor

RAM

D-TCM

VIC10 interrupt

coprocessor

ETM10RV trace/debug

BIU

Debug port connection

64-bit AMBA/AHB data bus for D

64-bit AMBA/AHB

data bus for IFLASH

14 Nov 2003 Embedded Computer Architecture 58

TCMTCM

TightlyTightly--Coupled MemoriesCoupled MemoriesAlternative to cachesAlternative to caches

As fast as cachesAs fast as cachesProgrammerProgrammer--controlledcontrolledNo automatic managementNo automatic managementCheaper to implementCheaper to implementMore predictable in behaviorMore predictable in behavior

Programming:Programming:In memory mapIn memory mapTagged like cachesTagged like caches

TCM

14 Nov 2003 Embedded Computer Architecture 59

Instruction Sets for ARMInstruction Sets for ARM

Base: ARM v5Base: ARM v53232--bit integerbit integer--only instruction setonly instruction set

T: thumb instruction setT: thumb instruction set1616--bit, for smaller core sizebit, for smaller core size

J: J: JazelleJazelle extensionsextensionsJava support in hardwareJava support in hardwareImplements 140 out of 228 JVM byte codesImplements 140 out of 228 JVM byte codes

E: DSP extensionsE: DSP extensionsDone in regular registersDone in regular registersSaturation, some more Saturation, some more MACsMACs

14 Nov 2003 Embedded Computer Architecture 60

The ARM Instruction SetThe ARM Instruction Set

Continuous evolutionContinuous evolutionAdd features required by marketAdd features required by marketRISC? Not anymore, if everRISC? Not anymore, if ever

Now at v6, in the ARM11 familyNow at v6, in the ARM11 familyv5, v5E in ARM9 and ARM10 v5, v5E in ARM9 and ARM10 V4 in old ARM7V4 in old ARM7Backwards compatibility!Backwards compatibility!

Page 16: Embedded Computer Architecture - Uppsala University · 8-bit Intel 8051, standard microcontroller Behavior, ... GSM Radio Code memory ... 14 Nov 2003 Embedded Computer Architecture

14 Nov 2003 Embedded Computer Architecture 61

TT: : ThThumb umb

Compressed instruction setCompressed instruction set1616--bit encoding of (parts of) bit encoding of (parts of) 3232--bit instructionbit instruction setsetLimitations in ARMLimitations in ARM//Thumb:Thumb:

Only access to 8 registers (16 Only access to 8 registers (16 in ARM modein ARM mode))No system operationsNo system operations

Effect:Effect:More but smaller instructionsMore but smaller instructions

30% more, at half size30% more, at half sizeUsually some performance lossUsually some performance loss

(Perform better on narrow buses)(Perform better on narrow buses)

14 Nov 2003 Embedded Computer Architecture 62

TT: Thumb: Thumb

Thumb sThumb shrinks the code:hrinks the code:Thumb ARM 386 8088 68020 SPARC

eqntott 10608 16768 17640 19106 20542 22256

0.63 1.00 1.05 1.14 1.23 1.33

xlisp 26388 40768 28097 29401 46746 44648

0.65 1.00 0.69 0.72 1.15 1.10

espresso 72596 109923 125686 137194 131854 142752

0.66 1.00 1.14 1.25 1.20 1.30

Source: Microprocessor Report, March 1995

14 Nov 2003 Embedded Computer Architecture 63

T2: Doing a Better ThumbT2: Doing a Better Thumb

ARM Thumb: fixed 16ARM Thumb: fixed 16--bit sizebit sizeSaves 28% space compared to 32Saves 28% space compared to 32--bit ARMbit ARMRuns 20% slower than 32Runs 20% slower than 32--bit ARMbit ARM

ARM Thumb 2: mixed 16/32ARM Thumb 2: mixed 16/32Brand new, arrives with ARM1156Brand new, arrives with ARM1156Saves 26% space compared to 32Saves 26% space compared to 32--bit ARMbit ARMRuns 2% slower than 32Runs 2% slower than 32--bit ARMbit ARM(Introduces some new instructions)(Introduces some new instructions)

Conclusion: mixed length good!Conclusion: mixed length good!Source: Microprocessor Report, June 2003

14 Nov 2003 Embedded Computer Architecture 64

Why T?Why T?

Pushed by mobile phonesPushed by mobile phonesMore memory = more expensiveMore memory = more expensiveMore memory = bigger packageMore memory = bigger packageMore memory = higher powerMore memory = higher power

More features in same memory!More features in same memory!Performance is not criticalPerformance is not critical

Page 17: Embedded Computer Architecture - Uppsala University · 8-bit Intel 8051, standard microcontroller Behavior, ... GSM Radio Code memory ... 14 Nov 2003 Embedded Computer Architecture

14 Nov 2003 Embedded Computer Architecture 65

T: CompetitorsT: Competitors

Compressed instruction setsCompressed instruction setsMIPS16e, shrunk MIPS32 ISAMIPS16e, shrunk MIPS32 ISAARCARCTensilicaTensilica

AllAll--small instruction setssmall instruction setsSH familySH family

Compressed codeCompressed codeIBM PowerPC 405 GXIBM PowerPC 405 GXDecompress when loaded into cacheDecompress when loaded into cache

14 Nov 2003 Embedded Computer Architecture 66

J: JazelleJ: Jazelle

Hardware Java accelerationHardware Java accelerationPushed by mobile phonesPushed by mobile phones

Why?Why?To fix Java performance problemsTo fix Java performance problems

SW JVM problems:SW JVM problems:Minimal clock frequency = Minimal clock frequency = low interpreter performancelow interpreter performanceJIT requires more memoryJIT requires more memory

14 Nov 2003 Embedded Computer Architecture 67

E: DSP ExtensionsE: DSP Extensions

A few new instructionsA few new instructionsSaturated arithmeticSaturated arithmetic

Add, Sub, Add, Sub, Signed multiply, MACSigned multiply, MAC

2 162 16--bit values in one registerbit values in one register16x1616x1632x1632x16

Count leading zeroesCount leading zeroesLoad/store pairs of registersLoad/store pairs of registers

Fairly typical Fairly typical ””DSPDSP”” additionsadditions14 Nov 2003 Embedded Computer Architecture 68

Why E?Why E?

Enhance DSP performanceEnhance DSP performanceOf standOf stand--alone ARM corealone ARM coreAvoid multipro solution Avoid multipro solution

Hard disk controllers, for exampleHard disk controllers, for example

Page 18: Embedded Computer Architecture - Uppsala University · 8-bit Intel 8051, standard microcontroller Behavior, ... GSM Radio Code memory ... 14 Nov 2003 Embedded Computer Architecture

14 Nov 2003 Embedded Computer Architecture 69

E: CompetitionE: Competition

DSPDSP--inin--processorprocessor““MAC=DSPMAC=DSP””Almost all embedded processors have itAlmost all embedded processors have itNo revolution in performanceNo revolution in performance

DSP/processor hybridsDSP/processor hybridsInfineonInfineon TricoreTricoreMicrochip Microchip DSPicDSPicHard to get it right, not a big success so farHard to get it right, not a big success so far

SIMD extensions SIMD extensions More extensive additions than v5EMore extensive additions than v5ERequires new functional unitsRequires new functional unitsMajor performance gain possibleMajor performance gain possible

14 Nov 2003 Embedded Computer Architecture 70

SIMD ExtensionsSIMD Extensions

HeavyHeavy--weight additionweight additionNew functional units, registersNew functional units, registersSmall vector computersSmall vector computers

Examples:Examples:ARM SIMD extensions (in v6)ARM SIMD extensions (in v6)Motorola Motorola AltivecAltivecMIPSMIPSx86 MMXx86 MMX--SSESSE--SSE2SSE2--3Dnow!3Dnow!SPARC VISSPARC VIS

14 Nov 2003 Embedded Computer Architecture 71

SIMD ExtensionsSIMD ExtensionsTargetTarget

MotorolaMotorolaPPC 7455 (G4+)PPC 7455 (G4+)1 1 GhzGhz

EEMBC EEMBC TelemarkTelemark suitesuiteNetworking suiteNetworking suite

OOTB:OOTB:OutOut--ofof--thethe--boxbox

OPT:OPT:Manually tuned to use Manually tuned to use AltivecAltivec

Overall/Average:Overall/Average:33--4 times speed up 4 times speed up can be expectedcan be expected

35,1

0

1

23

4

5

6

7

89

10

Aut

ocor

r 1

Con

volu

tion

1

Bit

allo

c 1

FFT

1

Vite

rbi 1

OS

PF

1

Rou

te 1

Pac

ket 5

12

OOTB OPT

14 Nov 2003 Embedded Computer Architecture 72

ARM ARM vsvs DSPDSP

Despite Despite ““EE”” and and ““SIMDSIMD””... ... Standard solution:Standard solution:

DualDual--core setupcore setupARM core ARM core DSP coreDSP core

Control Control vsvs datadata

Page 19: Embedded Computer Architecture - Uppsala University · 8-bit Intel 8051, standard microcontroller Behavior, ... GSM Radio Code memory ... 14 Nov 2003 Embedded Computer Architecture

14 Nov 2003 Embedded Computer Architecture 73

Control Control vsvs DataData

Control plane:Control plane:Standard processor tasksStandard processor tasksDecisionDecision--makingmaking““Integer applicationsInteger applications””UI of a phone, packet routing, UI of a phone, packet routing, ……

Data plane:Data plane:Move or process dataMove or process dataPerformance is keyPerformance is keySignal processing, multimedia, Signal processing, multimedia, ……Floating/fixed pointFloating/fixed point

14 Nov 2003 Embedded Computer Architecture 74

ARMARM--DSP: TI OMAP 5910DSP: TI OMAP 5910

Texas InstrumentsTexas InstrumentsTarget marketTarget market

DataData--intense realintense real--timetimeAudio, biometrics, etc.Audio, biometrics, etc.

Processing Processing DualDual--core chipcore chipARM925T 150 ARM925T 150 MhzMhzTI C55 DSP 150 TI C55 DSP 150 MhzMhz

Power 230 Power 230 mWmWPrice 32 USDPrice 32 USD

ARM shared devices

ARM private devices

System devices

DSP shared devices

DSP private devices

C55xDSP Core

24k I$

64k data SRAM

96k instrSRAM

ARM925CPU Core

16k I$

8k D$

MMU

192k Shared SRAM

MemCtrl

75 Mhz

LCD Ctrl

USB 1.1LCD controllerMMC/SDcard intfcamera interface keyboard interfaceRTCI2C8 serial ports3 UARTs14 GPIO pins

USB 1.1USB 1.1LCD controllerLCD controllerMMC/MMC/SDcardSDcard intfintfcamera interface camera interface keyboard interfacekeyboard interfaceRTCRTCI2CI2C8 serial ports8 serial ports3 3 UARTsUARTs14 GPIO pins14 GPIO pins

14 Nov 2003 Embedded Computer Architecture 75

ARM Family: ARM CoresARM Family: ARM Cores

ARM7

Performance

Time

ARM9

ARM10

ARM11

3-stage pipeunified cachelow power

5-stage pipeI/D caches

ARM9E5-stage pipeI/D cachesJava, DSP

1998

2000

2000

8-stage pipeDynamic BPOOO-completion550 Mhz

2002

6-stage pipeStatic BP64-bit BIUFP

1994

14 Nov 2003 Embedded Computer Architecture 76

ARM Family: Intel ChipsARM Family: Intel Chips

ARM7

Performance

Time

ARM9

ARM10

ARM11

StrongARM

XScale

ARM9E

19955-stage pipeLegandary performer

2001

7-10-stage pipeDynamic BP800 Mhz

Intel makes chips Intel makes chips based on the Xscale; based on the Xscale; does not license the does not license the

core to 3core to 3rdrd partiespartiesIntel got this from Intel got this from

Digital in 1998. Digital in 1998. A single variant, A single variant,

big in PDAs.big in PDAs.

Page 20: Embedded Computer Architecture - Uppsala University · 8-bit Intel 8051, standard microcontroller Behavior, ... GSM Radio Code memory ... 14 Nov 2003 Embedded Computer Architecture

14 Nov 2003 Embedded Computer Architecture 77

ConfConfigurable igurable Instruction Instruction

SetsSets

14 Nov 2003 Embedded Computer Architecture 78

Instruction Sets: ConfigureInstruction Sets: Configure

Configurable instruction setsConfigurable instruction setsAdapt to needs of applicationAdapt to needs of applicationUser can specialize the processorUser can specialize the processorLess waste on generalityLess waste on generalityFast evolution of instruction setsFast evolution of instruction sets

Traditionally:Traditionally:Chip manufacturers determine Chip manufacturers determine instruction sets aimed at some nicheinstruction sets aimed at some nicheSlow evolution of instruction setsSlow evolution of instruction sets

14 Nov 2003 Embedded Computer Architecture 79

Instruction Sets: ConfigureInstruction Sets: Configure

SubsetSubsettingtingThere is a limited and predefined set of There is a limited and predefined set of instructions availableinstructions availableEasy to compile for: restrict code Easy to compile for: restrict code gengenRemove instructions to simplify coreRemove instructions to simplify core

AdditionAdditionFFreedomreedom to to invent instructionsinvent instructionsTool chain: assemblyTool chain: assembly, C compilers, C compilersGenuine development of Genuine development of ISAsISAs

14 Nov 2003 Embedded Computer Architecture 80

Configurable Instruction SetsConfigurable Instruction Sets

Tight integration:Tight integration:Add to regular pipelineAdd to regular pipelineAdditional functional unitsAdditional functional unitsAdding fineAdding fine--grained instructionsgrained instructions

Loose integration:Loose integration:Coprocessor interfaceCoprocessor interfaceSlower communicationSlower communicationOffloading of macroOffloading of macro--scale tasksscale tasksMethod to invoke accelerator circuitsMethod to invoke accelerator circuits

Page 21: Embedded Computer Architecture - Uppsala University · 8-bit Intel 8051, standard microcontroller Behavior, ... GSM Radio Code memory ... 14 Nov 2003 Embedded Computer Architecture

14 Nov 2003 Embedded Computer Architecture 81

Configurability TrendConfigurability Trend

PioneersPioneersTensilicaTensilica XtensaXtensaArc ArctangentArc ArctangentConfigurability as key selling pointConfigurability as key selling point

Added to general architecturesAdded to general architecturesMIPS: MIPS: ““CorExtendCorExtend””PowerPC: PowerPC: ““BookEBookE ASUASU””Usually less tight integrationUsually less tight integration

14 Nov 2003 Embedded Computer Architecture 82

Benefit of ConfigurabilityBenefit of ConfigurabilityTargetTarget

XtensaXtensa IIIIII200 200 MhzMhz

EEMBC EEMBC TelemarkTelemark suitesuiteNetworking suiteNetworking suite

OOTB:OOTB:OutOut--ofof--thethe--boxbox25k gate core25k gate core

OPT:OPT:Tuned codeTuned code25k base core gates25k base core gates18k extra 18k extra instrinstr gatesgates100k DSP 100k DSP coproccoproc37k 37k configconfig gatesgates

SpeedupsSpeedups

Benchmark OOTB OPT Telemark overall 1 37

Autocorr 1 9

Convolution 1 1249

Bit alloc 1 34

FFT 1 24 Viterbi GSM 1 14

14 Nov 2003 Embedded Computer Architecture 83

ConfConfiguration Toolsiguration Tools

instruction set choices

Gate and memory size

counters