© 2006 Cisco Systems, Inc. All rights reserved.Cisco ConfidentialCRS-1 Technion 1 CRS-1 overview...
-
Upload
owen-walters -
Category
Documents
-
view
212 -
download
0
Transcript of © 2006 Cisco Systems, Inc. All rights reserved.Cisco ConfidentialCRS-1 Technion 1 CRS-1 overview...
© 2006 Cisco Systems, Inc. All rights reserved. Cisco ConfidentialCRS-1 Technion 1
CRS-1 overview
TAU – Mar 07
Rami Zemach
© 2006 Cisco Systems, Inc. All rights reserved. Cisco ConfidentialCRS-1 Technion 2
AgendaCisco’s high end
routerCRS-1
Future directions
CRS-1’s NP Metro (SPP)
CRS-1’s Fabric
CRS-1’s Line Card
© 2006 Cisco Systems, Inc. All rights reserved. Cisco ConfidentialCRS-1 Technion 3
What drove the CRS?
OC768
Multi chassis
Improved BW/Watt & BW/Space
New OS (IOS-XR)
Scalable control plane
A sample taxonomy
© 2006 Cisco Systems, Inc. All rights reserved. Cisco ConfidentialCRS-1 Technion 4
Multiple router flavours
CoreOC-12 (622Mbps) and up (to OC-768 ~= 40Gbps)
Big, fat, fast, expensive
E.g. Cisco HFR, Juniper T-640
HFR: 1.2Tbps each, interconnect up to 72 giving 92Tbps, start at $450k
Transit/Peering-facingOC-3 and up, good GigE density
ACLs, full-on BGP, uRPF, accounting
Customer-facingFR/ATM/…
Feature set as above, plus fancy queues, etc
Broadband aggregatorHigh scalability: sessions, ports, reconnections
Feature set as above
Customer-premises (CPE)100Mbps
NAT, DHCP, firewall, wireless, VoIP, …
Low cost, low-end, perhaps just software on a PC
A sample taxonomy
© 2006 Cisco Systems, Inc. All rights reserved. Cisco ConfidentialCRS-1 Technion 5
Routers are pushed to the edge
Over time routers are pushed to the edge as:BW requirements grow
# of interfaces scale
Different routers have different offeringInterfaces types (core is mostly Eathernet)
Features. Sometimes the same feature is implemented differently
User interface
Redundancy models
Operating system
Costumers look for:investment protection
Stable network topology
Feature parity
Transparent scale
A sample taxonomy
© 2006 Cisco Systems, Inc. All rights reserved. Cisco ConfidentialCRS-1 Technion 6
What does Scaling means …
Interfaces (BW, number, variance)
BW
Packet rate
Features (e.g. Support link BW in a flexible manner)
More Routes
Wider ECO system
Effective Management (e.g. capability to support more BGP peers and more events)
Fast Control (e.g. distribute routing information)
Availability
Serviceability
Scaling is both up and down (logical routers)
A sample taxonomy
© 2006 Cisco Systems, Inc. All rights reserved. Cisco ConfidentialCRS-1 Technion 7
RouteTableCPU Buffer
Memory
LineInterface
MAC
LineInterface
MAC
LineInterface
MAC
Typically <0.5Gb/s aggregate capacity
Shared Bus
Line Interface
CPU
Memory
Low BW feature rich – centralized Off-chip Buffer
© 2006 Cisco Systems, Inc. All rights reserved. Cisco ConfidentialCRS-1 Technion 8
High BW – distributed
LineCard
MAC
LocalBuffer
Memory
CPUCard
LineCard
MAC
LocalBuffer
Memory
“Crossbar”: Switched Backplane
Line Interface
CPUMemory Fwding
Table
RoutingTable
FwdingTable
Typically <50Gb/s aggregate capacity
© 2006 Cisco Systems, Inc. All rights reserved. Cisco ConfidentialCRS-1 Technion 9
Distributed architecture challenges (examples)
HW wiseSwitching fabric
High BW switchingQOSTraffic loss
Speedup
Data plane (SW)High BW / packet rateLimited resources (cpu, memory)
Control plane (SW)High event rateRouting information distribution (e.g. forwarding tables)
© 2006 Cisco Systems, Inc. All rights reserved. Cisco ConfidentialCRS-1 Technion 10
CRS-1 System View
Shelf controller
Shelf controller
Sys controller
Fabric ShelvesContains Fabric cards,System Controllers
Shelf controller
Shelf controller
Shelf controller
Sys controller
Line Card ShelvesContains Route Processors, Line cards, System controllers
NMS(Full system view)
Out of band GE control bus to all shelf controllers
100m
© 2006 Cisco Systems, Inc. All rights reserved. Cisco ConfidentialCRS-1 Technion 11
CRS-1 System ArchitectureFabric
Chassis
FORWARDING PLANE Up to 1152x40G40G throughput per LC
MULTISTAGE SWITCH FABRIC 1296x1296 non-blocking buffered fabric
Roots of Fabric architecture from Jon Turner’s early work
DISTRIBUTED CONTROL PLANEControl SW distributed across multiple control processors
Inte
rfa
ce M
od
ule
Inte
rfa
ce M
od
ule
MID
-PL
AN
EM
ID-P
LA
NE
Line CardLine Card
Line CardLine Card8 of 8
2 of 8
1 of 8
S1
S1 S2S2
S2S2
S3S3
S3S3
S1 S2S2 S3S3
Cisco SPP
Cisco SPP
Modular Service CardModular Service Card
8K Qs
8K Qs
µ µ
Route ProcessorRoute Processor
Route ProcessorRoute Processor
© 2006 Cisco Systems, Inc. All rights reserved. Cisco ConfidentialCRS-! Technion 12
Switch Fabric challenges
Scale - many ports
Fast
Distributed arbitration
Minimum disruption with QOS model
Minimum blocking
Balancing
Redundancy
© 2006 Cisco Systems, Inc. All rights reserved. Cisco ConfidentialCRS-! Technion 13
Previous solution: GSR – Cell based XBAR w centralized scheduling
Each LC has variable width links to and from the XBAR, depending on its bandwidth requirement
Central scheduling ISLIP based
Two request-grant-accept roundsEach arbitration round lasts one cell time
Per destination LC virtual output queues
Supports H/L priorityUnicast/multicast
Linecard(emphasizing fabric interface)
XBAR SwitchingMatrix
(showingconnections for just
one linecard)
Fabric Scheduler(showing
connections for justone linecard)
grant
request
XBAR Control
Request/GrantControl
Virtual OutputQueues
Cellavailabilityinformation
Celltransmitcontrol
ReassemblyQueues
Ingressdata
Egressdata
1 to 16 transmit andreceive lanes
# of lanes varies per linecardtype based on bandwidth
One Output Queue perdestination linecard
One Reassembly Queueper source linecard (and
per unicast/multicast)
To-Fabric Lane(s)
From-Fabric Lane(s)
© 2006 Cisco Systems, Inc. All rights reserved. Cisco ConfidentialCRS-! Technion 14
CRS Cell based Multi-Stage Benes
•Multiple paths to a destination
• For a given input to output port, the no. of paths is equal to the no. of center stage elements
•Distribution between S1 and S2 stages. Routing at S2 and S3
•Cell routing
© 2006 Cisco Systems, Inc. All rights reserved. Cisco ConfidentialCRS-! Technion 15
Fabric speedup
Q-fabric tries to approximate an output buffered switch to minimize sub-port blocking
Buffering at output allows better scheduling
In single stage fabrics a 2X speedup very closely approximates an output buffered fabric *
For multi-stage the speedup factor to approx output buffered behavior is not known
CRS-1 fabric’s ~5X speed up
constrained by available technology* Balaji prabhakar and nick McKeown computer systems technical report CSL-TR-97-738. November
1997.
© 2006 Cisco Systems, Inc. All rights reserved. Cisco ConfidentialCRS-! Technion 16
Fabric Flow ControlOverview
Discard - time constant in the 10’s of mS range
Originates from ‘from fab’ and is directed at ‘to fab’.
Is a very fine level of granularity, discard to the level of individual destination raw queues.
Back Pressure - time constant in the 10’s of S range.
Originates from the Fabric and is directed at ‘to fab’.
Operates per priority at increasingly coarse granularity:
Fabric Destination (one of 4608)
Fabric Group (one of 48 in phase one and 96 in phase two)
Fabric (stop all traffic into the fabric per priority)
© 2006 Cisco Systems, Inc. All rights reserved. Cisco ConfidentialCRS-! Technion 17
Reassembly Window
Cells transitioning the Fabric take different paths between Sprayer and Sponge.
Cells for the same packet will arrive out of order.
The Reassembly Window for a given Source is defined as the the worst-case differential delay two cells from a packet encounter as they traverse the Fabric.
The Fabric limits the Reassembly Window
© 2006 Cisco Systems, Inc. All rights reserved. Cisco ConfidentialCRS-1 Technion 18
Linecard challenges
Power
COGS
Multiple interfaces
Intermediate buffering
Speed up
CPU subsystem
© 2006 Cisco Systems, Inc. All rights reserved. Cisco ConfidentialCRS-1 Technion 19
Cisco CRS-1 Line Card
MODULAR SERVICES CARD PLIM
MID
PL
AN
EM
IDP
LA
NE
CPUSquidGW
OC192Framer
and Optics
OC192Framer
and Optics
OC192Framer
and Optics
OC192Framer
and Optics
OC192Framer
and Optics
OC192Framer
and Optics
OC192Framer
and Optics
OC192Framer
and Optics
Egress Packet FlowFrom Fabric
Interface Module ASIC
RX METRO
RX METRO
IngressQueuingIngressQueuing
TXMETRO
TXMETRO
From Fabric ASIC
From Fabric ASIC
EgressQueuingEgress
Queuing
4
1
8
76
5
23
© 2006 Cisco Systems, Inc. All rights reserved. Cisco ConfidentialCRS-1 Technion 20
MODULAR SERVICES CARD PLIM
MID
PL
AN
EM
IDP
LA
NE
CPUSquidGW
OC192Framer
and Optics
OC192Framer
and Optics
OC192Framer
and Optics
OC192Framer
and Optics
OC192Framer
and Optics
OC192Framer
and Optics
OC192Framer
and Optics
OC192Framer
and Optics
Egress Packet FlowFrom Fabric
Interface Module ASIC
RX METRO
RX METRO
IngressQueuingIngressQueuing
TXMETRO
TXMETRO
From Fabric ASIC
From Fabric ASIC
EgressQueuin
g
EgressQueuin
g
4
1
8
76
5
23
Line Card CPU
Egress Metro
Ingress Metro
Ingress Queuing
Power Regulators
Fabric Serdes
From Fabric
Egress Queuing
Cisco CRS-1 Line Card
© 2006 Cisco Systems, Inc. All rights reserved. Cisco ConfidentialCRS-1 Technion 21
Egress Metro
Ingress Metro
Line Card CPU
Ingress Queuing
Power Regulators
Fabric Serdes
From Fabric
Egress Queuing
Cisco CRS-1 Line Card
© 2006 Cisco Systems, Inc. All rights reserved. Cisco ConfidentialCRS-1 Technion 22
Ingress Metro
Cisco CRS-1 Line Card
© 2006 Cisco Systems, Inc. All rights reserved. Cisco ConfidentialCRS-1 Technion 24
Metro Subsystem
What is it ?
Massively Parallel NP
Codename Metro
Marketing name SPP (Silicon Packet Processor)
What were the Goals ?
Programmability
Scalability
Who designed & programmed it ?
Cisco internal (Israel/San Jose)
IBM and Tensilica partners
© 2006 Cisco Systems, Inc. All rights reserved. Cisco ConfidentialCRS-1 Technion 25
Metro Subsystem
Metro2500 Balls 250Mhz 35W
TCAM125MSPS128kx144-bit entries
2 channels
FCRAM166Mhz DDR9 Channels
Lookups and Table Memory
QDR2 SRAM250Mhz DDR5 Channels
Policing state Classification results Queue length state
© 2006 Cisco Systems, Inc. All rights reserved. Cisco ConfidentialCRS-1 Technion 26
Metro Top Level
Packet Out96 Gb/s BW
Packet In96 Gb/s BW
18mmx18mm - IBM .13um
18M gates
8Mbit SRAM and RAs
Control Processor Interface
Proprietary 2Gb/s
© 2006 Cisco Systems, Inc. All rights reserved. Cisco ConfidentialCRS-1 Technion 27
Gee-whiz numbers
188 32-bit embedded Risc cores
~50 Bips
175 Gb/s Memory BW
78 MPPS peak performance
© 2006 Cisco Systems, Inc. All rights reserved. Cisco ConfidentialCRS-1 Technion 28
Why Programmability ?Simple forwarding – not so simple
Example FEATURES:
• MPLS–3 Labels
• Link Bundling (v4)
• Load Balancing L3 (v4)
• 1 Policier Check
• Marking
• TE/FRR
• Sampled Netflow
• WRED
• ACL
• IPv4 Multicast
• IPv6 Unicast
• Per prefix accounting
• GRE/L2TPv3 Tunneling
• RPF check (loose/strict) v4
• Load Balancing V3 (v6)
• Link Bundling (v6)
• Congestion Control
• IPv4 Unicastlookup algorithm
Hundreds of Load
balancing Entries per
Millions of
Routes
100k+ of
adjacencies
Pointer to
Statistics
Counters
L3loadbalanceentry L2
info
Increasing pressure to add 1-2 level of
increased indirection for High
Availability and increased update
rates
Lookup
L3info
Load Balancing and Adjacencies : Sram/DRAMSram/Dram
leaf
policy basedrouting TCAMtable
TCAM
PBR associative
Sram/DRAM
1:1
data
L2 Adjacency
Programmability also meansAbility to juggle feature orderingSupport for heterogeneous mixes of feature chainsRapid introduction of new features (Feature Velocity)
© 2006 Cisco Systems, Inc. All rights reserved. Cisco ConfidentialCRS-1 Technion 29
96G188 96G
96G
96G
PPEPPE
PPEPPE
On-Chip Packet Buffer
Resource Fabric
ResourceResource
ResourceResource
Metro Architecture Basics
Packet tails stored on-
chip Packet Distribution
Run-to-completion (RTC)simple SW model efficient heterogeneous feature processing
RTC and Non-Flow based Packet distribution means scalable architecture
CostsHigh instruction BW supplyNeed RMW and flow ordering solutions
~100Bytes of packet
context sent to PPEs
© 2006 Cisco Systems, Inc. All rights reserved. Cisco ConfidentialCRS-1 Technion 30
96G188 96G
96G
96G
PPEPPE
PPEPPE
On-Chip Packet Buffer
Resource Fabric
ResourceResource
ResourceResource
Metro Architecture Basics
Packet Gather
Gather of Packets involves : Assembly of final packets (at 100Gb/s)
Packet ordering after variable length processing
Gathering without new packet distribution
© 2006 Cisco Systems, Inc. All rights reserved. Cisco ConfidentialCRS-1 Technion 31
96G188 96G
96G
96G
PPEPPE
PPEPPE
On-Chip Packet Buffer
Resource Fabric
ResourceResource
ResourceResource
Metro Architecture Basics
Packet Buffer
accessible as Resource
Resource Fabric is parallel wide multi-drop busses
Resources consist ofMemoriesRead-modify-write operationsPerformance heavy mechanisms
© 2006 Cisco Systems, Inc. All rights reserved. Cisco ConfidentialCRS-1 Technion 32
Metro ResourcesStatistics
512k
TCAM
Interface Tables
Policing100k+
Lookup Engine
2M Prefixes
Table DRAM (10’sMB)
Queue Depth State
P1
P2P3
P4P6
P5
P7
P1
P2P3
P4 P5
P7P8
P9
P8
P6
P9
Root Node
ChildArray
Child Pointer
Child Pointer Child Pointer
ChildArray
ChildArray
<= Level 1
<= Level 2
<= Level 3
CCR April 2004 (vol. 34 no. 2) pp 97-123. CCR April 2004 (vol. 34 no. 2) pp 97-123. “Tree Bitmap : Hardware/Software IP “Tree Bitmap : Hardware/Software IP
Lookups with Incremental Updates”, Will Lookups with Incremental Updates”, Will Eatherton et. Al.Eatherton et. Al.
Lookup Engine uses TreeBitmap Algorithm
FCRAM and on-chip memory
High Update ratesConfigurable performance
Vs density
© 2006 Cisco Systems, Inc. All rights reserved. Cisco ConfidentialCRS-1 Technion 33
16 PPE Clusters
Each Cluster of 12 PPE’s
Packet Processing Element (PPE)
.5sqmm per PPE
© 2006 Cisco Systems, Inc. All rights reserved. Cisco ConfidentialCRS-1 Technion 34
Packet Processing Element (PPE)
32-bit RISC
ICACHE
DATA Mem
CiscoDMA
instruction bus
Memory mapped Regs
Distribution Hdr
Pkt Hdr
Scratch Pad
Processor Core
ClusterInstruction
MemoryGlobal
Instruction Memory
ClusterData
Mux Unit
To12 PPE’s
Pkt Distribution
From Resources
Pkt Gather
To Resources
Tensilica Xtensa core with Cisco enhancements
32-bit, 5-stage pipeline
Code Density : 16/24 bit instructions
Small instruction cache and data memory
Cisco DMA engine – allows 3 outstanding Descriptor DMAs
10’s Kbytes Fast instruction memory
To12 PPE’s
PPE
© 2006 Cisco Systems, Inc. All rights reserved. Cisco ConfidentialCRS-1 Technion 35
Programming Model and Efficiency
Metro Programming ModelRun to completion programming model
Queued descriptor interface to resources
Industry leveraged tool flow
Efficiency Data Points1 ucoder for 6 months: IPv4 with common features (ACL, PBR,
QoS, etc..)
CRS-1 initial shipping datapath code was done by ~3 people
© 2006 Cisco Systems, Inc. All rights reserved. Cisco ConfidentialCRS-1 Technion 36
Challenges
Constant power battle
Memory and IO
Die Size Allocation
PPEs Vs HW acceleration
Scalability
On-chip BW vs off-chip capacity
Procket NPU 100MPPS - limited scaling
Performance
© 2006 Cisco Systems, Inc. All rights reserved. Cisco ConfidentialCRS-1 Technion 37
future directions
POP convergence
Edge and core differences blur
Smartness in the network
More integrated services into the routing platforms
Feature sets needing acceleration expanding
Must leverage feature code across platforms/markets
Scalability (# of processors, amount of memory, BW)
© 2006 Cisco Systems, Inc. All rights reserved. Cisco ConfidentialCRS-1 Technion 38
Summary
Router business is diverse
Network growth push routers to the
edge
Costumers expect scale from one hand
… and smart network
Routers become a massive parallel
processing machines
© 2006 Cisco Systems, Inc. All rights reserved. Cisco ConfidentialCRS-1 Technion 39
Questions ?
Thank You
© 2006 Cisco Systems, Inc. All rights reserved. Cisco ConfidentialCRS-1 Technion 41
CRS-1 Positioning
Core router (overall BW, interfaces types)
1.2 Tbps, OC-768c Interface
Distributed architecture
Scalability/Performance
Scalable control plane
High Availability
Logical Routers
Multi-Chassis Support
© 2006 Cisco Systems, Inc. All rights reserved. Cisco ConfidentialCRS-1 Technion 42
Networks planes
Networks are considered to have three planes / operating timescales
Data: packet forwarding [μs, ns]
Control: flows/connections [ ms, secs]
Management: aggregates, networks [ secs, hours ]
Planes coupling is in descendent order (control-data more, management-control less)
© 2006 Cisco Systems, Inc. All rights reserved. Cisco ConfidentialCRS-1 Technion 43
Exact Matches in Ethernet Switches Trees and Tries
Binary Search Tree
< >
< > < >
log
2 NN entries
Binary Search Trie
0 1
0 1 0 1
111010
Lookup time bounded and independent of table size, storage
is O(NW)
Lookup time dependent on table size, but independent of address length, storage is O(N)
© 2006 Cisco Systems, Inc. All rights reserved. Cisco ConfidentialCRS-1 Technion 44
Exact Matches in Ethernet Switches Multiway tries
16-ary Search Trie
0000, ptr 1111, ptr
0000, 0 1111, ptr
000011110000
0000, 0 1111, ptr
111111111111
Ptr=0 means no children
Q: Why can’t we just make it a 248-ary trie?