S ilicon Archit ect ures f or W ireless S yst ems – Part 1
Transcript of S ilicon Archit ect ures f or W ireless S yst ems – Part 1
Tut orial Tut orial Hot ChipsHot Chips 01 01
Jan M. RabaeyJan M. Rabaey
BWRC
University of California @ Berkeley
http://www.eecs.berkeley.edu/~jan
S ilicon Archit ect ures f orS ilicon Archit ect ures f or
W ireless S yst ems W ireless S yst ems –– Part 1 Part 1
The The FibonacciFibonacci Law on Wireless Growth Law on Wireless Growth
Source: Goldman-Sachs
“The number of world-wide wireless
subscribers (in tens of millions)
grows as a Fibonacci series”
1999 2000 2001 2002 2003
46
57
67
?
From Handsets to Mobile DevicesFrom Handsets to Mobile Devices
Internet access the most important driver
(text, graphics, multimedia)
Berkeley Infopad,
One of the first wireless
internet applicance
1990-1996
A A Smorgasboard Smorgasboard of Choicesof Choices
NARROWBANDNARROWBAND WIDEBAND WIDEBANDCIRCUIT PACKETCIRCUIT PACKETVOICE DATAVOICE DATA
9.6Kbps9.6Kbps
GSM
IS136
IS95
PCS1900
DCS1800
IS54B
22ndnd generation generation
HCSD
IS95-B
GPRS
64-384Kbps64-384Kbps
IS136+
generation 2generation 2.5.5
384-2000Kbps384-2000Kbps
WCDMA
Edge
cdma2000
HDR
W-TDMA
ARIB-WCDMA
WCDMA-FDD
33rdrd generation generation
And the AlternativesAnd the Alternatives
• Metropolitan Wireless Networks
– Various proprietary solutions
• Ricochet (up to 138 Kb/sec)
• Flarion Flash-OFDM (> 384 kBits/sec)
• Wireless LANs– 802.11 (b,a); Hyperlan
– from 1 to 56 Mbit/sec
– Restricted to the 50 meter range (at present)
(Projected) Growth in 802.11 WLAN(Projected) Growth in 802.11 WLAN
0
2000
4000
6000
8000
10000
12000
14000
16000
18000
2001 2002 2003 2004 2005
1-2Mbps
10-11Mbps
20Mbps+
54 Mbps
Total
Units, k
Source: Cahners In-Stat 2001
The New InternetThe New Internet
Clusters
Massive Cluster
Gigabit Ethernet
Pre-1990:
Client-Server Systems
The 2000s:
Extending toward the Small
Enabled by integration
and wireless connectivity
The 1990s:
Conquering the World
The Network revolution
The Post-PC Era: The Distributed Approach toThe Post-PC Era: The Distributed Approach to
Information ProcessingInformation Processing
The emergence of ad-hoc wirelessThe emergence of ad-hoc wireless
networks networks –– the wire replacement the wire replacement
• Bluetooth, HomeRF
ü up to 800 kBit/sec
• Sensor networks
ü Low data-rates
The Evolving Wireless SceneThe Evolving Wireless Scene
Range
Dat
a R
ate
1m 10m 100m 1km 10km
1Kb
10Kb
100Kb
1Mb
10Mb
100Mb
Cellular (WAN)
3G Cellular
2.5 G Cellular
802.11 (LAN)
802.1a
Bluetooth (PAN)
Sensor networks
Metropolitan
More bit/secMore bit/sec
Compelling Issues in Wireless (1)Compelling Issues in Wireless (1)
Ubiquitous services put wireless spectrum at apremium
• Effective use of aether hampered by standardizationand fragmentation
• Current spectral efficiency far below theoreticallimits
• Emerging Solutions
– Adoption of better spectrum utilization techniques (interference
cancellation, multi-path fading mitigation and exploitation)
– multi-functional, adaptive systems
• But … huge appetite for computations
Evolution of MOPS Requirements inEvolution of MOPS Requirements in
CellularCellularFunction MOPS
Digital RRC Channel 3600
Searcher 2100
RAKE 2050
Maximal Ratio Combiner 24
Channel Estimator 12
AGC, AFC 10
Deinterleaver 15
Turbo Coder 90
TOTAL 7901
Single 384 kbps UTRA W-CDMA Channel
Source: IEEE Comm Theory Workshop, May 1999
1
10
100
1000
10000
GSM HSCSD GPRS 3G
DS
P M
IPS
Comm
Application
1
10
100
1000
10000
100000
0 2 4 6 8 10 12
SNR (db)
Rela
tive C
om
ple
xit
yThe Cost of Approaching ShannonThe Cost of Approaching Shannon’’s Bounds Bound
Courtesy Engling Yeo, UCB
The Bliss and Challenge of Error CodingThe Bliss and Challenge of Error Coding
8/9 Turbo, =4, N=4k
2/3 Turbo, =4, N=64k 1,2, and 3 iterations
1/2 Turbo, =4, N=64k 1,2,
and 3 iterations
8/9 LDPC, N=4k
1,3, and 5 iterations
8/9 Conv. Code,=3, N=4k
1/2 Conv. Code,=4, N=64k
2/3 Conv. Code,=4, N=64k
8/9 Capacity Bound
2/3 Capacity Bound
1/2 Capacity Bound
1/2 LDPC, N=107, 1100 iterations
for BER of 10-5
Dealing with Non-ideal Channels (e.g., fading)Dealing with Non-ideal Channels (e.g., fading)
• Multi-antenna approach exploitsmulti-path fading by sending dataalong good channels
• Results in large theoreticalimprovements in bandwidthefficiency for fading channels
• But…computationally hungry
)(tx
Array
Processing
1st path, 1 = 1
2nd path, 2 = 0.6
SNR (dB)0 5 10 15 20 250
5
10
15
20
25
30
Capacity (
bits/s
/Hz)
(4,4) With Feedback
(4,4) No Feedback(4,1) Orthogonal Design (1,1) Baseline
The Cost of Dealing with Non-ideal ChannelsThe Cost of Dealing with Non-ideal Channels
0
1000
2000
3000
4000
5000
6000
7000
Performance
MIPS
0.8 Mb/s
0.9 b/s/Hz
1.6 Mb/s
1.8 b/s/Hz
1.9 Mb/s
2.1 b/s/Hz
3.8 Mb/s
4.2 b/s/Hz
5.6 Mb/s
6.3 b/s/Hz
Data rate per user
Spectral efficiency
* Assume 25 MHz bandwidth and 28 users
Matched
Filter
Multi-User
Detector
OFDM +
Coding
OFDM +
Multi-Antenna
OFDM + Multi-
Antenna + Coding
Source: Ning Zhang, UCB
Shannon beats Shannon beats MooreMoore’’s s lawlaw
1
10
100
1000
10000
100000
1000000
10000000
1980
1984
1988
1992
1996
2000
2004
2008
2012
2016
2020
Algorithmic Complexity(Shannon’s Law)
Processor Performance (~Moore’s Law)
Courtesy: Ravi Subramanian (Morphics)
1G
2G
3G
Single-Chip DSPs are Lagging ...Single-Chip DSPs are Lagging ...
1
10
100
1000
10000
1980 1985 1990 1995 2000
Year
Megam
acs
DSP Trend: x 1.4/year
Moore’s law: x 1.58/year
Source: TI
DSPs
While algorithms are beating While algorithms are beating MooreMoore’’s s law!law!
Digital Processor Performance Digital Processor Performance
1.000E+00
1.000E+01
1.000E+02
1.000E+03
1.000E+04
1.000E+05
1.000E+06
1.000E+07
1.000E+08
1.000E+09
1.000E+10
1.000E+11
1.000E+12
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51
memory
processors
1960 1970 1980 1990 2000 2010
100
10
1
0.1
0.01
0.001
No
rmali
zed
p
rocesso
r sp
eed
microprocessor/DSP
mA/ MIP
computational efficiency
Sources: Proc ISSCC, ICSPAT, DAC, DSPWorld
memory
Tra
nsis
tors
/ch
ip
Courtesy of Ravi Subramanian (Morphics)
The Law of Diminishing ReturnsThe Law of Diminishing Returns
• More transistors are being thrown atimproving general-purpose CPU and DSPperformance
• Fundamental bounds are being pushed– limits on instruction-level parallelism
– limits on memory system performance
• Returns per transistor are diminishing– new architectures realizing only 2-3 instructions/clock
– increasingly large caches to hide DRAM latency
Some observationsSome observations• Von-Neuman style instruction set architectures
were perceived when switching devices andinterconnections were extraordinarily expensive,and multiplexing-in-time provided the mosteconomical solution– Intel 4004: 2000 transistors, 1 MHz clock frequency, 1 metal
layer
• This led to the “clock-speed” affixation, which infact is only a secondary measure of performance
• Power is rapidly becoming a limiting factor– Newest processors are including thermal sensors and
automatic slow-down (throttling) using pipeline bubbles andnop’s to combat overheating and meltdown
Compelling Issues in Wireless (2)Compelling Issues in Wireless (2)
“The Last Meter Problem”Ubiquitous wireless networking requiressteep reduction in cost and energydissipation
• To be acceptable, radio cost has to be below 1$
• Frequent battery replacement on 100’s of devicesunacceptable
• Technology not likely to be of major help
Energy to Play a Major RoleEnergy to Play a Major Role
1
10
100
1000
10000
100000
1000000
10000000
1980
1984
1988
1992
1996
2000
2004
2008
2012
2016
2020
Algorithmic Complexity(Shannon’s Law)
Processor Performance (~Moore’s Law)
Battery Capacity1G
2G
3G
Courtesy: Ravi Subramanian (Morphics)
Energy Trends in DSPsEnergy Trends in DSPs
C15 @ 3v
C52 @ 3v
32010 @ 5v
C25 @5v
ATT16xx @2.7V
1v DSP
C52 @ 5v
0.5v DSP
2v DSP
C5x @ 2v
C15 @ 5v
0.001
0.01
0.1
1
10
100
1000
1982
1984
1986
1988
1990
1992
1994
1996
1998
2000
2002
2004
2006
2008
Year
mW
/MIP
S
DSP
Power
Gene's
Law
Factor 1.6 reductionper year
Source: TI
Energy Trends in Energy Trends in DSPsDSPsGene (Frantz)Gene (Frantz)’’s Laws Law
Gene’s Law
DSP Power
1,000
100
10
1
0.1
0.01
0.001
0.0001
0.00001
mW
/MIP
S
1982
1984
1986
1988
1990
1992
1994
1996
1998
2000
2002
2004
2006
2008
YearSource: Gene Frantz (TI)
Energy to Play a Major RoleEnergy to Play a Major RoleA holistic perspectiveA holistic perspective
Energy = upper bound on the amount of availablecomputation
– Total Energy of Milky Way Galaxy: 1059 J
– Minimum switching energy for digital gate(1 electron@100 mV): 1.6 10-20 J (limited by thermalnoise)
– Upper bound on number of digital operations: 6 1078
– Operations/year performed by 1 billion 100 MOPScomputers: 3 1024
– Energy consumed in 180 years assuming a doublingof computational requirements every year.
Putting energy in perspectivePutting energy in perspective
• Energy cost of digital computation– 1999 (0.25µm): 1pJ/op (custom) … 1nJ/op (µproc)
– 2004 (0.1µm): 0.1pJ/op (custom) … 100pJ/op (µproc)
• Factor 1.6 per year; Factor 10 over 5 years
• Assuming reconfigurable implementation: 1 pJ/op
• Energy cost of communication– 1999 Bluetooth
• 1 nJ/bit transmission energy (thermal limit 30 pJ/bit)
– 2.4 GHz band, 10 m distance
• Overall energy: 170 nJ/bit reception / 150 nJ/bit transmission (!)
• Standby power: 300 µW
– 2004 Radio (10 m)
• Only minor reduction in transmission energy
• Reduce transceiver energy with at least a factor 10-50
The Changing MetricsThe Changing Metrics
• Power and/or Energy have becomedominant drivers
– Limiting factor for performance and reliability in
wall-plugged applications
– Enabler for wide-spread use of distributed
computing and data access
• Energy reduction requires jointoptimization process between applicationand implementation
The Changing MetricsThe Changing Metrics
• Design complexity, and “context complexity” issufficiently high that design verification is amajor limitation on time-to-market
• Cost of fabrication facilities and mask making hasincreased significantly
– NRE cost of new design has increased significantly
• Physical effects (parasitics, reliability issues,power management) are increasingly significantin the design process
– These must now be considered explicitly at the circuit level
The Changing MetricsThe Changing Metrics
Flexibility
Power
Cost
Performance as a Functionality ConstraintPerformance as a Functionality Constraint
((““Just-in-Time ComputingJust-in-Time Computing””))
Challenges in Single-Chip Radio DesignChallenges in Single-Chip Radio Design
Physical
+ RF
Mac/
Data Link
NetworkApplicationDataData
Data Acquisition
DataEncoding
DataFormatting
Mod/Demod
UI
ControlControl
Synchron-ization
SlotAllocation
CallSetup
Data and Time Granularity
nsecµsecmsecsecbitspacketsstreamssource data
RadioRadio
The Software RadioThe Software Radio
A/D Converter
D/A ConverterDSP
• Idea: Digitize (wideband) signal at antenna anduse signal processing to extract desired signal
• Leverages of advances in technology, circuitdesign, and signal processing
• Software solution enables flexibility andadaptivity, but at huge price in power and cost
• 16 bit A/D converter at 2.2 GHz dissipates 1 to 10W
The Mostly Digital RadioThe Mostly Digital Radio
DigitalBasebandReceiver
RF input(fc = 2GHz)
LNA
cos[2π(2GHz)t]
RF filter
chip boundary
I (50MS/s)
Q (50MS/s)
A/D
A/D
sin[2π(2GHz)t]
Analog Digital
The Opportunities The Opportunities ……• Scaling of technology, of course
– Performance doubles every 18 months
– Energy reduced by 10 every 5 years
• The system-on-a-chip– Integration of heterogeneous functions on a single die leads
to new solutions
• Novel architectures– Processor architectures exploiting concurrency (e.g. VLIW)
– Reconfigurable hardware
– Network-on-a-chip
• Novel circuit solutions– Voltage as a design variable
• Novel design methodologies– Communication-based Design
– Platform-based Design
Reconfigurable
DataPath
Reconfigurable
State Machines
Embedded uP
+ DSPs
FPGA
Dedicated
DSP
The Ideal The Ideal ““Radio-on-a-ChipRadio-on-a-Chip”” Platform Platform
• Heterogeneous
• Supports massive
concurrency
• Matches the computational
model
• Operates at minimum supply
voltage and clock frequency
• Provides flexibility only where
needed and desirable and at
the right granularity
Combines performance, flexibility and energy-efficiency
The System-on-a-Chip NightmareThe System-on-a-Chip Nightmare
““Femme seFemme se coiffant coiffant””
Pablo Ruiz PicassoPablo Ruiz Picasso
19401940
The System-on-a-Chip NightmareThe System-on-a-Chip Nightmare
Bridge
DMA CPU DSP
Mem
Ctrl.
MPEG
CI O O
System Bus
PeripheralBus
Control Wires
CustomInterfaces
The “Board-on-a-Chip”
Approach
Courtesy: Sonics, Inc
Current Integrated Wireless TransceiversCurrent Integrated Wireless Transceivers
AD
Timing
recovery
phone
bookJava
VM
ARQ
Keypad,
Display
Control
FiltersAdaptiveAntenna
Algorithms
Equalizers MUD
Accelerators
(bit level)
analog digital
Logic
A “Board-on-a-Chip” Approach
DSP
µprocASIC
System-on-a-ChipSystem-on-a-Chip
A Renaissance in DesignA Renaissance in Design
ApplicationsApplicationsMultimediaConsumerCommunications
ImplementationImplementationFabricsFabricsSilicon reuseSilicon networks
DesignDesignMethodologyMethodologyHard+Soft
Aart De Geus
DAC’99
ConvergenceConvergence
A Central Theme:A Central Theme:
Raising the Reuse FactorRaising the Reuse Factor
Reuse comes in generations
Generation Reuse e lement Status
1st Standard ce lls Well established
2nd IP blocks Be ing int roduced
3rd Architec ture Emerging
4th IC Ear ly research
Source: Theo Claasen (Philips) – DAC 00
Platform-Based DesignPlatform-Based Design
• A platform is a restriction on the space of possibleimplementation choices, providing a well-defined abstractionof the underlying technology for the application developer
• New platforms will be defined at the architecture-micro-architecture boundary
• They will be component-based, and will provide a range ofchoices from structured-custom to fully programmableimplementations
• Key to such approaches is the representation ofcommunication in the platform model
““Only the consumer gets freedom of choice;Only the consumer gets freedom of choice;
designers need freedomdesigners need freedom fromfrom choicechoice””
((OrfaliOrfali, et al, 1996, p.522), et al, 1996, p.522)
Source:R.Newton
Hardware PlatformsHardware Platforms
Hardware Platform: not only a fully specified SoC butalso a family of architectures that share somecommon feature:
A Hardware Platform is a family of architectures thatsatisfy a set of architectural constraints imposed toallow the re-use of hardware and softwarecomponents.
The stronger the constraints the more component re-use but stronger constraints imply fewerarchitectures to choose from!
Hardware Platforms Not Enough!Hardware Platforms Not Enough!
• Hardware platform has to be abstracted
• Interface to the application software is API
• Software layer performs abstraction:
– Programmable cores and memory subsystem with
RTOS
– I/O subsystem via Device Drivers
Software PlatformsSoftware Platforms
Output DevicesInput devices
Hardware Platform
I O
Hardware
Software
network
Software Platform
Application Software
Platform API
Software Platform
RT
OS
BIOS
Device Drivers
Net
wo
rk
Co
mm
un
icat
ion
The Platform ApproachThe Platform Approach
Architectural Space
Application Space
Application Instance
Platform Instance
System
Platform
Platform Design Space
Exploration
Platform
Specification
Source:
Alberto Sangiovanni-Vincentelli
TM-xxxxD$
I$
Tr iMedia CPU
DEVICE I /P BLOCK
DEVICE I /P BLOCK
DEVICE I /P BLOCK
.
.
.
DVP System Silicon
VLIW MediaProcessor :• 100 to 300+MHz• 32-bit or 64-bit
Nexper iaSystem Busses• PI bus• Memory bus• 32-128 bit
PI
BU
S
SDRAM
MMI
DV
P M
EM
OR
YB
US
DEVICE I /P BLOCK
PRxxxxD$
I$
MIPS CPU
DEVICE I /P BLOCK
.
.
.
DEVICE I /P BLOCK
PI
BU
S
Genera lPurpose RISCProcessor• 50 to 300+ MHz• 32-bit or 64-bitLibra ry ofDevice Blocks• Image
coprocessors• DSPs• UART• 1394• USB
•…and more
Tr iMedia TMMIPS TM
Example: PhilipsExample: Philips Nexperia NexperiaTMTM
DVP DVP
Flexible a rchitec ture for digita l video applica t ion sSource: Theo Claasen (Philips) – DAC 00
NexperiaNexperiaTMTM
ScalabilityScalability
VLIW
SDRAM
MMI
Tr iMedia CPU + Device blocks when
control func t ions a reminima l
MIPS CPU +Tr imedia CPU
replac ing some Device blocks
RISC
SDRAM
VLIWMMI
• Single architecture, multiple product configurations– Processor core options - TM32, TM64, MIPS32, MIPS64 ...
– Device block options
• Highly programmable to weakly programmable
MIPS CPU +Device blocks +
Software
RISC
SDRAM
MMI
Source: Theo Claasen (Philips) – DAC 00
An Example of an InstantiationAn Example of an Instantiation
Philips Nexperia NX-2700
A programmable HDTV
media processor
Combines Trimedia VLIW with
Configurable media co-processors
Pleiades:Pleiades:
Digital Wireless PlatformDigital Wireless Platform
AD
Timing
recovery
phone
book
Java
VM
ARQ
Keypad,
Display
Control
FiltersAdaptive
Antenna
Algorith
ms
Equalizers MUD
Accelerators
(bit level)
analog digital
DSP core
uC core
(ARM)
Logic
Dedicated Logic
and Memory
Source: Berkeley Wireless Research Center
The Architectural ChoicesThe Architectural Choices
µP
Prog Mem
MACUnit
AddrGenµP
Prog Mem
µP
Prog Mem
Satellite
ProcessorDedicated
Logic
Satellite
Processor
Satellite
Processor
GeneralPurpose
µP
Software
DirectMapped
Hardware
HardwareReconfigurable
Processor
ProgrammableDSP
Fle
xib
ility
1/Efficiency
An Attractive Option:An Attractive Option:
Multi-Processor System-on-a-ChipMulti-Processor System-on-a-Chip
Courtesy: Chris Rowen, Tensilica
Copyright Tensilica, Inc 2001
“The New NAND gate”
The Energy-Flexibility GapThe Energy-Flexibility Gap
Embedded ProcessorsSA1100.4 MIPS/mW
ASIPsDSPs 2 V DSP: 3 MOPS/mW
DedicatedHW
Flexibility (Coverage)
En
ergy E
ffic
ien
cy
MO
PS
/mW
(or
MIP
S/m
W)
0.1
1
10
100
1000
ReconfigurableProcessor/Logic
Pleiades10-80 MOPS/mW
Orthogonalizing Communication fromOrthogonalizing Communication from
BehaviorBehavior
• Historically lots of work on Behavior– hierarchy well established
– several descriptions available (with variable levels of precision)
– synthesis available
• Communication less well investigated– hard to separate from behavior, usually intertwined
– telecomm protocols are the best existing example
• Need to understand Formalism, Abstraction, andDecomposition for communication
DEFINED BY MODELS OF COMPUTATION!
An Integrated Radio Processor (TCI)An Integrated Radio Processor (TCI)
Embedded
Processor
MemoryMemory
Sub-systemSub-system
Baseband
Processing
FixedFixed
Protocol StackProtocol Stack
Programmable
Protocol Stack
The The ““Network-on-a-ChipNetwork-on-a-Chip””
N Inputs
B Buses
M Outputs
Multi-Bus
cluster
cluster
cluster
Hierarchical MeshMesh
Module
Platform ExplorationPlatform Exploration
The Y-Chart ApproachThe Y-Chart Approach
ApplicationsApplicationsArchitecture
Instance
Mapping
Applications
Performance
Analysis
Performance
Numbers
Source: B. Kienhuis
• Technology scaling is redefining the term “complexity”
• System-on-a-Chip fosters renaissance in processorarchitecture, opening the door for new models andcombinations thereof:Platform and Communication Based Design
• SOC for wireless driven by new set of metrics: how tosimultaneously optimize flexibility, cost, energy, andperformance?
• Application-Architecture Exploration is Focal Part ofImplementation Methodology
Summary and PerspectiveSummary and Perspective