Basic Low Level Concepts s Router micro-architectures...
Transcript of Basic Low Level Concepts s Router micro-architectures...
1
© Sudhakar Yalamanchili, Georgia Institute of Technology (except as indicated)
Basic Low Level Concepts
ECE 8813a (2)
Course Outline
• Operation of a single link: switching and flow control
• Operation through a single switch: Router micro-architectures v Buffering, arbitration, scheduling,
datapath
• Operation through multiple switches: Topologies & Routing v Direct, indirect, regular, irregular
• Formal models and analysis for deadlock and livelock freedom
Op
timization
: technology, congestion, reliability
Cas
e S
tud
ies
ECE 8813a (3)
Overview
• Main architectural issues for communication over a single link v Message units v Flow control (lossless links) v Switching (next) v Buffer management (later) v Arbitration & Scheduling (later)
• System Goals: high levels of link utilization and minimal impact on end-to-end latency
ECE 8813a (4)
Sources
• Chapters 1 & 2 v “Interconnection Networks: An Engineering
Approach”, J. Duato, S. Yalamanchili and L. Ni, Morgan Kaufmann (pubs.)
• Papers v Virtual Channel Flow Control v Optimistic Flow Control – Illinois Fast Messages
2
ECE 8813a (5)
pro
c/m
em b
us
IO b
us
or
pro
c/m
em b
us
Message Passing • Communication Protocol
v Typical steps followed by the sender: 1. System call by application
n Copies the data into OS and/or network interface memory n Packetizes the message (if needed) n Prepares headers and trailers of packets
2. Checksum is computed and added to header/trailer 3. Timer is started and the network interface sends the packets
processor memory
user space
system space
reg
iste
r fi
le
ni
FIFO
Interconnection network
ni
FIFO
memory
user space
system space
processor
reg
iste
r fi
le
packet
user writes data in
memory
system call sends 1 copy
pipelined transfer
e.g., DMA
IO b
us
or
pro
c/m
em b
us
pro
c/m
em b
us
© T.M. Pinkston, J. Duato, with major contributions by J. Filch ECE 8813a (6)
Message Passing • Communication Protocol
v Typical steps followed by the receiver: 1. NI allocates received packets into its memory or OS memory 2. Checksum is computed and compared for each packet
n If checksum matches, NI sends back an ACK packet 3. Once all packets are correctly received
n The message is reassembled and copied to user's address space
n The corresponding application is signalled (via polling or interrupt)
pro
c/m
em b
us
IO b
us
or
pro
c/m
em b
us
processor memory
user space
system space
reg
iste
r fi
le
ni
FIFO
Interconnection network
ni
FIFO
memory
user space
system space
processor
reg
iste
r fi
le
packet
data ready
interrupt 2 copy
pipelined reception e.g., DMA
IO b
us
or
pro
c/m
em b
us
pro
c/m
em b
us
© T.M. Pinkston, J. Duato, with major contributions by J. Filch
ECE 8813a (7)
Shared Memory
• L2 miss v Miss Status Handling
Register (MSHR) allocation, address mapping, packet construction
v Message injection, flow control set-up, rate control
• Message reception v Packet ejection, update
message status (return), and control processing (end-to-end flow control)
v Packet servicing, message injection
ECE 8813a (8)
Shared Memory
thewere42.worldpress.com
• L2 miss v Miss Status Handling
Register (MSHR) allocation, address mapping, packet construction
v Message injection, flow control set-up, rate control
• Message reception v Packet ejection, update
message status (return), and control processing (end-to-end flow control)
v Packet servicing, message injection
3
ECE 8813a (9)
The Network Model
message path
Metrics (for now): latency and bandwidth
Routing, switching, flow control, error
control
ECE 8813a (10)
Basic Switch Microarchitecture Switch Traversal
Route Computation
Link Traversal
D hops L bit message W bit wide channels
Phys
ical
ch
anne
l Ph
ysic
al
chan
nel
Link
Co
ntro
l
... M
UX
DEM
UX
Link
Co
ntro
l
... M
UX
DEM
UX
DEM
UX
Link
Co
ntro
l
Phys
ical
ch
anne
l
... M
UX
DEM
UX
Link
Co
ntro
l
Phys
ical
ch
anne
l
... M
UX
Cro
ssB
ar
Switch & VC Allocation
Route Computation
Route Computation
ECE 8813a (11)
Off-Chip vs. On-Chip
• Wide links • Shallow pipelines • Not pin limited • Low flow control latency • Smaller buffers
• Narrow links • Deeper pipelines • Pin limited • Larger flow control latency • Deeper buffers
On-Chip Off-Chip
ECE 8813a (12)
The Hardware Message Stack
Routing Layer
Switching Layer
Physical Layer
Where?: Destination decisions, i.e., which output port
When?: When is data forwarded
How?: synchronization of data transfer
• Switching is tightly coupled with flow control & buffer management
• Relative timing is key to performance
Largely responsible for deadlock and livelock properties
Largely responsible for latency, bandwidth and energy properties
4
ECE 8813a (13)
Messaging Units
• Data is transmitted based on a hierarchical data structuring mechanism v Messages à packets à flits à phits v While flits and phits are fixed size, packets and data
may be variable sized
Data/Message
head flit
Flits: flow control digits
Phits: physical flow control digits
Packets
Dest Info Seq # misc
tail flit
type
© Sudhakar Yalamanchili, Georgia Institute of Technology (except as indicated)
Link Level Flow Control
ECE 8813a (15)
Flow Control
• A synchronization protocol for the lossless transmission of bits
• Determines how network resources are allocated v Buffers
• Determines how conflicts are resolved v How (e.g., priorities) and when resources are
assigned
ECE 8813a (16)
For Synchronized Transfers
• Unit of synchronized communication v Smallest unit whose transfer is requested by the
sender and acknowledged by the receiver v No restriction on the relative timing of control vs.
data transfers v Is a form of backpressure
Acknowledge Receipt
5
ECE 8813a (17)
For Buffer Management
• Flow control occurs at two levels v Level of buffer management (flits/packets) v Level of physical transfers (phits) v Relationship between flits and phits is machine &
technology specific
• What if there are no buffers? v Bufferless switching/flow control (later)
Buffer availability information
ECE 8813a (18)
Physical Channel Flow Control
Synchronous Flow Control • How is buffer space
availability indicated?
Asynchronous Flow Control • What is the limiting factor
on link throughput?
ECE 8813a (19)
Flow Control Mechanisms
• Credit Based flow control
• On/off flow control
• Optimistic/Reliable Flow control
• Virtual Channel Flow Control
ECE 8813a (20)
Credit-Based Flow Control
• Basic Network Structure and Functions v Credit-based flow control
receiver sender
Sender sends packets whenever
credit counter is not zero
10 Credit counter 9 8 7 6 5 4 3 2 1 0
X
Queue is not serviced
pipelined transfer
© T.M. Pinkston, J. Duato, with major contributions by J. Filch
6
ECE 8813a (21)
Credit-Based Flow Control
• Basic Network Structure and Functions v Credit-based flow control
receiver sender
10 Credit counter 9 8 7 6 5 4 3 2 1 0
+5
5 4 3 2
X
Queue is not serviced
Receiver sends credits after they become available
Sender resumes injection
pipelined transfer
© T.M. Pinkston, J. Duato, with major contributions by J. Filch ECE 8813a (22)
Timeline*
credit
flit
credit
credit
Node 1 Node 2
process credit
process credit
Round trip credit time equivalently expressed in number of flow control buffer units - trt
*From W. J. Dally & B. Towles, “Principles and Practices of Interconnection Networks,” Morgan Kaufmann, 2004
ECE 8813a (23)
Performance of Credit Based Schemes
• The control bandwidth can be reduced by submitting block credits
• Buffers must be sized to maximize link utilization v Large enough to host packets in transit
f
rt
LbtF ×
≥
*From W. J. Dally & B. Towles, “Principles and Practices of Interconnection Networks,” Morgan Kaufmann, 2004
link bandwidth
flit size
#flit buffers
ECE 8813a (24)
On/Off Flow Control
• Basic Network Structure and Functions v Xon/Xoff flow control
receiver sender
Xon Xoff a packet is injected
if control bit is in Xon
Control bit
Xon
Xoff
pipelined transfer
© T.M. Pinkston, J. Duato, with major contributions by J. Filch
7
ECE 8813a (25)
On-Off Flow Control
• Basic Network Structure and Functions v Xon/Xoff flow control
receiver sender
Xon Xoff
When Xoff threshold is
reached, an Xoff notification is sent
Control bit Xoff
Xon
When in Xoff, sender cannot inject packets
X
Queue is not serviced
pipelined transfer
© T.M. Pinkston, J. Duato, with major contributions by J. Filch ECE 8813a (26)
On-Off Flow Control
• Basic Network Structure and Functions v Xon/Xoff flow control
receiver sender
Xon Xoff
When Xon threshold is
reached, an Xon notification is sent
Control bit
Xon
Xoff
X
Queue is not serviced
pipelined transfer
© T.M. Pinkston, J. Duato, with major contributions by J. Filch
ECE 8813a (27)
Timeline*
flit
off
Node 1 Node 2
process FC
*From W. J. Dally & B. Towles, “Principles and Practices of Interconnection Networks,” Morgan Kaufmann, 2004
flit
flit
flit
flit
flit
flit
flit
on
Hit the high water mark (stop)
Hit the low water mark (go)
Go
Stop
ECE 8813a (28)
Performance of On-Off Schemes
• Buffer sizing and position of Stop and Go watermarks
• To operate at full speed buffer size must be at least 2F
Go
Stop
f
rt
LbtF ×
≥
*From W. J. Dally & B. Towles, “Principles and Practices of Interconnection Networks,” Morgan Kaufmann, 2004
8
ECE 8813a (29)
Comparison of Flow Control Schemes
• Basic Network Structure and Functions v Comparison of Xon/Xoff vs credit-based flow control
Sto
p &
Go
Time
Cre
dit
b
ased
Time
Go
Stop
Go
Stop
Go
Stop
Sender stops
transmission
Last packet reaches receiver
buffer
Stop
Go
Packets in buffer get processed
Go signal returned to
sender
Sender resumes
transmission
First packet reaches buffer
# credits returned to sender
Sender uses
last credit
Last packet reaches receiver
buffer
Packets get processed and credits returned
Sender transmits packets
First packet reaches buffer
Flow control latency observed by
receiver buffer
Stop signal returned by
receiver
© T.M. Pinkston, J. Duato, with major contributions by J. Filch ECE 8813a (30)
Comparing Credit-Based & On/Off Flow Control
• Both schemes can fully utilize buffers
• Restart latency is lower for credit-based schemes and therefore v Credit-based flow control has higher average buffer
occupancy at high loads v Credit-based flow control leads to higher throughput
at high loads v Smaller inter-packet gap
ECE 8813a (31)
Comparing Credit-Based & On/Off Flow Control (cont.)
• Control traffic is higher for credit schemes v Block credits can be used to tune link behavior
• Buffer sizes are independent of round trip latency for credit schemes (at the expense of performance) v Not true for On/Off without dropping packets
• Credit schemes have higher information content à useful for QoS schemes
• On-off schemes better suited for many to one relationships
ECE 8813a (32)
Optimistic Flow Control
• Optimistically send messages v Allocate space for returned messages v Deallocate on reception of Ack v Retransmit on reception of Nack
• Buffer sizes are proportional to the number of packets rather than the number of senders
Receiving Endpoint
Sending Endpoint
DATA DATA
ACK/NACK
bus Net
Network Interface
DATA bus
Network Interface
Reject Queue
9
ECE 8813a (33)
Reliable Flow Control
• Transmit packets when available v De-allocate when reception is acknowledged v Re-transmit if packet is dropped (and negative ACK is
received) • Derived from traditional telecom networks
v Employed over long and error prone links v Extended to operate over the network à end-to-end
Receiving Endpoint
Sending Endpoint
DATA DATA
ACK/NACK
bus Net
Network Interface
DATA bus
Network Interface
ECE 8813a (34)
Reliable Flow Control
Receiving Endpoint
Sending Endpoint
DATA DATA
ACK
bus Net
Network Interface
DATA bus
Network Interface
1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8
last ack’d packet last tx’d packet
• Packets are tagged with sequence numbers • Need to recycle sequence numbers
• Receiver acknowledges (Ack) received packets v Detect out of sequence reception v Time-outs to detect lost packets
last rcv’d packet
Retransmission interval
ECE 8813a (35)
Reliable Flow Control
• Data structures to hold transmitted packets and the order in which they were transmitted v Utilize the send buffers
o Go-back-N strategy v Maintain a separate data structure
o Maintain original order v Minimize redundant transmissions v Block acknowledgements to minimize flow control
bandwidth used
Receiving Endpoint
Sending Endpoint
DATA DATA
ACK
bus Net DATA
bus
ECE 8813a (36)
Buffering
• Used for long and error prone links
• Also known as Ack/Nack flow control
f
rt
LbtF ×
≥
link bandwidth
flit size
#flit buffers
10
ECE 8813a (37)
Optimistic/Reliable Flow Control
• Optimism v Inefficient/increased buffer usage
o Messages held at source o Re-ordering may be required due to out of order
reception
• Reliable v Must deal with out-of-order reception v Need sophisticated buffer management schemes for
multi-source control • Generally give way to credit-schemes or stop-
and-go schemes v Small buffers à credit-based v Large buffers à stop-and-go
ECE 8813a (38)
Virtual Channel Flow Control
A
B
A
B
• Channels and buffers are dynamically allocated network resources
• Physical channels are idle when messages block
ECE 8813a (39)
Virtual Channels
• Each virtual channel is a unidirectional channel v Independently managed buffers multiplexed over the
physical channel • Each channel is independently flow controlled • Improves performance through reduction of blocking
delay • Important in realizing deadlock freedom (later)
Input buffers
Link
Co
ntro
l
... MU
X
DEM
UX
Output buffers
DEM
UX
Link
Co
ntro
l
... MU
X
Output status credits vc state Per VC state
Unidirectional Physical Channel
ECE 8813a (40)
Virtual Channel Flow Control
• As the number of virtual channels increase, the increased channel multiplexing has multiple effects (more later) v Overall performance v Router complexity and critical path
• Flits/phits must now record VC information v Or send VC information out of band
Virtual ChannelsVC
flit
type
packet
11
ECE 8813a (41)
Tile R
Tile R
Tile R
Tile R
Tile R
Tile R
Tile R
Tile R
Tile R
Tile R
Tile R
Tile R
Tile R
Tile R
Tile R
Tile R
Tile R
Tile R
Tile R
Tile R
Tile R
Tile R
Tile R
Tile R
V1
V2
n 24 dual core tiles n 8 voltage and 28 frequency islands n X-Y routed mesh: 144 bit physical channels
Mem
ory
Con
trol
ler
Mem
ory
Con
trol
ler
Mem
ory
Con
trol
ler
Mem
ory
Con
trol
ler
Intel Single Chip Cloud Computer (SCC)
ECE 8813a (42)
Intel SCC Message Format
J. Howard et. Al, “A 48-Core IA-32 Processor in 45 nm CMOS Using On-Die Message-Passing and DVFS for Performance and Power Scaling.” IEEE Journal of Solid-State Circuits, vol. 46, no. 1, January 2011.
Flit Types: Null Credit Body/tail control
ECE 8813a (43)
Flow Control: Global View
• Flow control parameters are tuned based on link length, link width and processing overhead at the end-points
• Effective FC and buffer management is necessary for high link utilizations à network throughput v In-band vs. out of band flow control
• Links maybe non-uniform, e.g., lengths/widths on chips v Buffer sizing for long links
ECE 8813a (44)
Flow Control: Global View
• Latency: overlapping FC, buffer management and switching à impacts end-to-end latency
• In-band vs. out-of band flow control v Use link bandwidth vs. additional side-band signals
12
ECE 8813a (45)
Commercial Examples
• AMD HyperTransport – credit based • Intel QuickPath – credit based
• Infiniband – credit based • Ethernet – On/Off
• Myrinet – Stop-and-Go • PCI Express – credit based
• IBM Blue Gene – token flow control • Cray T3E – credit based
ECE 8813a (46)
Some Research Questions
• Reliable Flow Control v PVT effects for high speed links v Encoding schemes, e.g., for power efficiency
• Adaptive flow control v Buffer and congestion management v Quality of Service (QoS)
• End-to-End v Flow control for multicast v Multisource flow control (networks)
• Low power designs v Error rate vs. voltage scaling v Link and buffer widths and depths v On-off schemes
ECE 8813a (47)
Summary
• Flow control, buffer management and switching are closely related and generally co-designed v Closest to the physical layer and directly impact
utilization and latency v Object of significant tuning
• How are these schemes impacted by and integrated with switch designs?