SunLine Transit Agency Advanced Technology Fuel Cell Bus ...
Advanced Bus
-
Upload
mohammad-seemab-aslam -
Category
Documents
-
view
225 -
download
0
description
Transcript of Advanced Bus
STATESTATE--OFOF--THETHE--ARTART
INTERCONNECT FABRICS ANDINTERCONNECT FABRICS AND
COMMUNICATION PROTOCOLSCOMMUNICATION PROTOCOLS
AHB: AHB: criticalcritical overviewoverview� Protocol
� Lacks parallelism
� In order completion
� Address of next transaction just anticipated on the bus ����
� No multiple outstanding transactions: cannot hide slave wait
states effectively
� High arbitration overhead (min. 2 cycles on single-transfers)� High arbitration overhead (min. 2 cycles on single-transfers)
� Bus-centric vs. transaction-centric
� Initiators and targets are exposed to bus architecture (e.g.
arbiter)
� No decoupling, instance-specific bus components
� Topology
� Scalability limitation of shared bus solution!
Bus evolutionBus evolution
ProtocolT
ow
ard
im
pro
ved
uti
lizati
on
of
the t
op
olo
gy (
thro
ug
htp
ut,
late
ncy)
Topology
Toward enhanced parallelism
To
ward
im
pro
ved
uti
lizati
on
of
the t
op
olo
gy (
thro
ug
htp
ut,
late
ncy)
Topology evolutionTopology evolution
Shared bus with unidirectional
Request and response lanes
Crossbar with unidirectional
Request and response lanes
Topology evolutionTopology evolution
Partial Crossbar
with unidirectional
request and
response lanes
xbar
S4
S3
S2
S1
S0
M6
0 M0 M1
Shared bus
P2 P3 T1 M2 M3
Shared bus
P4 P5 T2 M4 M5
Shared bus
P6 P7 T3 M7
Shared bus
P8 P9 T4 M8 M9
Shared bus
Multi-layer bus architecture
The The communicationcommunication bottleneckbottleneck
off-chip
memory
controller
LX
IPTG
IP 1
IP 2
IPTG
IPTG
IPTG
IPTG
IPTG
IPTG
IPTG
System interconnect� Today: multi-layer topology
controller
IP 3
IP 5
IP 3
IPTG
IPTG IPTG
IPTG
IPTG
IPTG
IPTG
IPTG
IPTG
IPTG
� Jeopardizing design predictability, feasibility and cost!
The The communicationcommunication bottleneckbottleneck
off-chip
memory
controller
LX
IPTG
IP 1
IP 2
IPTG
IPTG
IPTG
IPTG
IPTG
IPTG
IPTG
System interconnect� Today: multi-layer topology
controller
IP 3
IP 5
IP 3
IPTG
IPTG IPTG
IPTG
IPTG
IPTG
IPTG
IPTG
IPTG
IPTG
� Jeopardizing design predictability, feasibility and cost!
Topology evolutionTopology evolution
4-ary 2-
mesh
Switches 16
Bis. Band. 4
Tiles x
Switch
1
Switch Arity 6Switch Arity 6
Max. Hops 6
4-ary 2-mesh
Tile
Switch
4-ary 2-
mesh
2-ary 4-
mesh
Switches 16 16
Bis. Band. 4 8
Tiles x
Switch
1 1
Switch Arity 6 6
Topology evolutionTopology evolution
Switch Arity 6 6
Max. Hops 6 4
4-ary 2-mesh 2-ary 4-mesh
Tile
Switch
Tile
Switch
4-ary 2-
mesh
2-ary 4-
mesh
2-ary 2-
mesh
Switches 16 16 4
Bis. Band. 4 8 2
Tiles x
Switch
1 1 4
Switch Arity 6 6 10
Topology evolutionTopology evolution
4-ary 2-mesh 2-ary 2-mesh
Low latency
Switch Arity 6 6 10
Max. Hops 6 4 2
Tile
Switch
Tile
Switch
Split transactionsSplit transactionsA splitsplit--transaction bus transaction bus is a bus where the request and response phases
are split and independent to improve bus utilization
-Master must arbitrate for the request phase
-Slave must arbitrate for the response phase
Master Slave
Request Response
Bus
Busy
Bus released Bus
busy
Bus released
Multiple outstanding transactionsMultiple outstanding transactionsMaster
Queue of pending
requests
Slave
Queue of pending
responses
Requests Responses
�The master needs to associate each response to one of its pending requests
�The initiator should support multiple outstanding transactions too
OutOut--ofof--order order completioncompletion
Master
To S2
To S1
S1-slow
Queue of
pending
requests
S2 -fast
Queue of
pending
requests
time
Requests
� Association between requests and responses is more challenging
� The typical case for out-of-order completion is when a fast slave is
addressed after a slow slave. The fast slave will return its response earlier.
From S2 From S1
OutOut--ofof--order order completioncompletion
Master
S12
S11
S1
S11
S12
Queue of
pending
requeststime
anticipated
Requests
� Out-of-order completion even in case multiple outstanding transactions are
addressed to the same complex slave
� A complex slave may use local optimizations and change the processing
order of incoming requests (e.g., serve accesses to an open row first in an
SDRAM device)
Resp of S12 Resp of S11
BusBus--centric architecturecentric architecture
Master
interface
Slave
interface
Bus
architecture
� Internal bus components are directly exposed to the connected
master and slave interfaces
� The bus architecture is instance-specific and lacks modularity
architecture
TransactionTransaction--centric architecturecentric architectureMaster interface
Bus
architecture
Slave interface
Point-to-point
Communication
Protocol
Hidden components
Slave interface
Master interface
� Internal bus components are hidden behind bus interfaces
� Modular architecture
� Orthogonalization of concerns
� Internal bus architecture can freely evolve without impacting the interfaces
� The only objective of interfaces: specifying communication transactions!
(communication abstraction)
architecture
But what is there on the market?But what is there on the market?But what is there on the market?But what is there on the market?
AMBA MultiAMBA Multi--layer AHBlayer AHB� Enables parallel access paths between multiple masters and
slaves
� Fully compatible with AHB wrappers
� It is a topology (not protocol) evolution
� Pure combinational matrix (scales poorly with no of I/Os)
Master1
Master2
Slave1Interconnect
Matrix
Slave1
Slave1
AHB
AHB
MultiMulti--Layer AHB implementationLayer AHB implementation
� The matrix is completely flexible and can be adapted � MUXes are point arbitration stages� AHB layer can be AHB-lite: single master, no req/grant, no split/retry
MultiMulti--layerlayer AHB AHB implementationimplementation� A layer loosing arbitration is waited by means of HREADY
� When a layer is waited, input stage samples pipelined address and control signals
HierarchicalHierarchical systemssystems
• Slaves accessed only by masters on a given layer can
be made local to the layer
Multiple Multiple slavesslavesMultiple slaves appear as
single
slave to the matrix
• combine low bandwidth
slaves
• group slaves accessed
only
by one master (e.g. DMAby one master (e.g. DMA
controller)
Alternatively, a slave can be
an AHB-to-APB bridge, thus
allowing connection to
multiple low-bandwidth
slaves
Multiple Multiple mastersmasters per per layerlayer
Combine masters that have
low bandwidth requirements
PuttingPutting itit alltogether…alltogether…
Interconnect matrix and Slave4
are used for across-layer
communication
HW
semaphores
DualDual portport slavesslaves
Common for off-chip SDRAM controllers
• Layer1: bandwidth limited high priority traffic with
low latency requirements (e.g., processor cores)
• Layer2: Bandwidth-critical traffic
(e.g., hardware accelerators)
The dual-port slave may even be connected to the matrix
AMBA 3.0 (AMBA AXI)AMBA 3.0 (AMBA AXI)
• High bandwidth – low latency designs
• High frequency operation
• Flexibility in the implementation
• Backward compatible with AHB and APB
This is an evolution of the communication protocol
Novel features with respect to AHB
• Burst-based transactions with only first address issued
• Address information can be issued before/after actual
write data transfer
• Multiple outstanding addresses
• Out-of-order transaction completion
• easy addition of register stages for timing closure
Design Design paradigmparadigm changechange
Maste
r
Sla
ve
Maste
r
Sla
ve
Target
Communication
architecture
AXI AXIInitiator Target
� Point-to-point interface specification
� Independent of the implementation
of the communication architecture
� Communication architecture can (be) freely evolve (customized)
� Transaction-based specification of the interface
� Open Core Protocol (OCP) is another example of this paradigm
AXI AXI
TransactionTransaction--centriccentric busbus
AXI can be used to interconnect:
-an initiator to the bus
-a target to the bus
-an initiator with a target
The interface definition
allows a variety of different
interconnect
implementations
Maste
r
Sla
ve
InitiatorAXI
Target
InterconnectInterconnect approachesapproaches
Maste
r
Sla
ve
AXIcrossbar
Maste
r
Sla
ve
AXI
shared
busM
aste
r
Sla
ve
Maste
r
Sla
ve
Most systems use one of three interconnect approaches:-shared address and data buses-Shared address buses and multiple data buses-Multilayer, with multiple address and data buses
Most common
ChannelChannel--based Architecturebased Architecture� Five groups of signals
� Read Address “AR” signal name prefix
� Read Data “R” signal name prefix
� Write Address “AW” signal name prefix
� Write Data “W” signal name prefix
� Write Response “B” signal name prefix� Write Response “B” signal name prefix
R. ADDRESS
READ DATA
WRITE DATA
RESPONSE
W. ADDRESS
Channels are independent and asynchronous wrt each other
Read transactionRead transaction
Single address for burst transfers
Write transactionWrite transaction
Single response for an entire burst
Channels Channels -- One way flowOne way flow
AWVALID
AWDDR
AWLEN
AWSIZE
AWBURST
AWLOCK
RVALID
RLAST
RDATA
RRESP
RID
RREADY
WVALID
WLAST
WDATA
WSTRB
WID
WREADY
BVALID
BRESP
BID
BREADY
AWPROT
AWCACHE
AWID
AWREADY
AWPROT
� Channel: a set of unidirectional information
signals
� Valid/Ready handshake mechanism
� READY is the only return signal
� Valid: source IF has valid data/control signals
� Ready: destination IF is ready to accept data
� Last: indicates last word of a burst transaction
ValidValid –– readyready handshakehandshake
AMBA 2.0 AHB BurstAMBA 2.0 AHB Burst
A21 A22 A23A11 A12 A13 A14
D21 D22 D23D11 D12 D13 D14
D31
D31
ADDRESS
DATA
� AHB Burst
� Address and Data are locked together
� Two pipeline stages
� HREADY controls pipeline operation
AXI AXI -- One Address for BurstOne Address for Burst
A21A11
D21 D22 D23D11 D12 D13 D14
D31
D31
ADDRESS
DATA D21 D22 D23D11 D12 D13 D14 D31DATA
� AXI Burst
� One Address for entire burst
AXI AXI -- Outstanding TransactionsOutstanding Transactions
A21A11
D21 D22 D23D11 D12 D13 D14
D31
D31
ADDRESS
DATA
� AXI Burst
� One Address for entire burst
� Allows multiple outstanding addresses
Problem: Slow Problem: Slow slaveslave
A21A11 A31ADDRESS
D11 D12DATA
� If one slave is very slow, all data is held
up.
OutOut--ofof--Order CompletionOrder Completion
A21A11
D21 D22 D23 D11 D12 D13 D14
D31
D31
ADDRESS
DATA
� Out of order completion allowed
Fast slaves may return data ahead of slow slaves� Fast slaves may return data ahead of slow slaves
� Complex slaves may serve requests out-of-order
� Each transaction has an ID attached (given by the master IF)
� Channels have ID signals - AID, RID, etc.
� Transactions with the same ID must be ordered
� The interconnect in a multi-master system must append
another tag to ID to make each master’s ID unique
OrderingOrdering restrictionsrestrictions
Simple rulesSimple rules
A simple master can issue transactions with the same ID
(implicitely forcing in-order delivery)
A simple slave can serve requests in the order they arrive,A simple slave can serve requests in the order they arrive,
regardless of the ID tag
AXI AXI -- Data InterleavingData Interleaving
A21A11
D21 D22 D23D11 D12 D13 D14
D31
D31
ADDRESS
DATA D21 D22 D23D11 D12 D13 D14D31DATA
� Returned data can even be interleaved
� Gives maximum use of data bus
� Note - Data within a burst is always in
order
BurstBurst readread
Valid high until ready high
The valid-ready handshake regulates data transfer
This is clearly a split transaction bus!
Overlapping burst readOverlapping burst read
Address of second burst issued:
True outstanding transactions
BurstBurst writewrite
� Channels are
asynchronous
� Register slices can
be applied across
any channel
Register slices for max frequencyRegister slices for max frequency
WREADY
WID
WDATA
WSTRB
WLAST
WVALID
any channel
� Allows maximum
frequency of operation
by changing delay into latency
WREADY
Other AXI featuresOther AXI features� No early burst termination, but fine granularity specification of burst beats
(1-16)
� Burst types:
� Fixed (FIFO-like))
� Incremental
� Wrapping
� Support for system caches
� Bufferable vs. Cacheable transactions� Bufferable vs. Cacheable transactions
� Support for
� Priviledged transactions vs. Normal ones
� Secure vs. non-secure transactions
� Support exclusive accesses
� Read exclusive, followed by write exclusive
� Support for locked accesses
� Terminated by an unlocked access
� Write data interleaving ( of transactions with different IDs)
ComparisonComparison
AHB
2 wait states memories
�It is impossible to
hide slave response
latency
Init1
Init2
Init3
Mem1
Mem2
Mem3
Bus
STBUS low buf
STBUS high buf
AXI
latency
Interleaving support in
interfaces and
interconnect allow
better interconnect
exploitation
While the previous
response phase is in
progress, a new request
can be processed by the
next addressed slave
More data pre-accessed
while previous response
phase is in progress
ScalabilityScalability
� Highly parallel benchmark (no slave bottlenecks)
� 1 memory wait state
70%
80%
90%
100%
110%
Relative execution time
110%
120%
130%
140%
150%
160%
170%
180%
Relative execution time
AHB AXI STBus STBus (B)
0%
10%
20%
30%
40%
50%
60%
70%
2 Cores
4 Cores
6 Cores
8 Cores
Relative execution time
AHB AXI STBus STBus (B)
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
110%
2 Cores
4 Cores
6 Cores
8 Cores
Relative execution time
� 1 kB cache (low bus traffic)
� 256 B cache (high bus traffic)
ScalabilityScalability
30%
40%
50%
60%
70%
80%
90%
100%
2 Cores
4 Cores
6 Cores
8 Cores
Interconnect usage efficiency
30%
40%
50%
60%
70%
80%
90%
100%
2 Cores
4 Cores
6 Cores
8 Cores
Interconnect busy
AHB AXI STBus STBus (B)
0%
10%
20%Interconnect usage efficiency
AHB AXI STBus STBus (B)
0%
10%
20%
� Increasing contention: AXI, STBus show 80%+ efficiency, AHB < 50%
� Saturation of shared bus architectures
NetworksNetworks--onon--Chip (NoCs)Chip (NoCs)
Same paradigm of Wide Area Networks and
of large scale multi-processors
IP coremaster
NI
switch
IP coremaster
NI
switch
switchNoC
IP coremaster
NI
switch
IP coremaster
NI
switch
switchNoC
IP coremaster
NIIP coremaster
NI
switch
IP coremaster
NIIP coremaster
NI
switch
switchNoC
PAYLOAD HEADERTAIL
Packet
NIIP coreslave
IP coremaster
NI
NIIP coreslave
NIIP coreslave
switch
switch
NoC
NIIP coreslave
IP coremaster
NI
NIIP coreslave
NIIP coreslave
switch
switch
NoC
NIIP coreslave
NIIP coreslave
IP coremaster
NIIP coremaster
NI
NIIP coreslave
NIIP coreslave
NIIP coreslave
NIIP coreslave
switch
switch
NoC
Clean separation
at session layerCore issues end-to-end
transactions
(through AXI, OCP,..),
Network deals with
lower level issues
Modularity at HW level
Only 2 building blocks:
network interface,
switch
Physical design aware
Path segmentation
Regular routing
FLITFLITFLITLFLIT
Shared buses vs NoCsShared buses vs NoCs
- Each integrated IP core adds bus load capacitance
+ Only point-to-point one-way links are used
- Bus timing problems in deep sub-micron designs
NoCs Pros….
- Bus timing problems in deep sub-micron designs
+ Better suited for GALS paradigm
- Arbiter delay grows with no of masters. Instance-specific arbiter
+ Distributed routing decisions. Reinstantiable switches
- Bus bandwidth is shared among all masters
+ Bus bandwidth scales with network dimension
Shared buses vs NoCsShared buses vs NoCs
+ After bus is granted, bus access latency is null
- Unpredictable latency due to network congestion problems
+ Very low silicon cost
NoCs Cons….
+ Very low silicon cost
- High area cost
+ Simple bus-IP core interface
- Network-IP core interface can be very complex (e.g. packetization,..)
+ Design guidelines are well known
- Design guidelines start to consolidate