A New Soware Architecture for Core Internet Routers · 2015. 8. 23. · Core Routers are built as...

31
A New So(ware Architecture for Core Internet Routers Robert Broberg September 16, 2011

Transcript of A New Soware Architecture for Core Internet Routers · 2015. 8. 23. · Core Routers are built as...

Page 1: A New Soware Architecture for Core Internet Routers · 2015. 8. 23. · Core Routers are built as Clusters but act as a single virtual machine • Mulple line cards with potenally

A New So(ware Architecture for Core Internet Routers 

Robert Broberg 

September 16, 2011 

Page 2: A New Soware Architecture for Core Internet Routers · 2015. 8. 23. · Core Routers are built as Clusters but act as a single virtual machine • Mulple line cards with potenally

Disclaimers and Credits 

•  This is research and no product plans are implied by any of this work. 

•  r3.cis.upenn.edu •  Early and conInued support from www.vu.nl 

•  A large team has generated this work and I am just one of many spokespersons for them. – any mistakes in this talk are mine. 

Page 3: A New Soware Architecture for Core Internet Routers · 2015. 8. 23. · Core Routers are built as Clusters but act as a single virtual machine • Mulple line cards with potenally

Agenda  

Overview of the evoluIon of Core router design 

A sampling of SW problems encountered during evoluIon 

An approach to resolving SW problems and conInued evoluIon 

Page 4: A New Soware Architecture for Core Internet Routers · 2015. 8. 23. · Core Routers are built as Clusters but act as a single virtual machine • Mulple line cards with potenally

Core Router EvoluIon 

•  WAN interconnects of Mainframes over telecommunicaIon infrastructure 

•  LAN/WAN interconnects – CORE routers(1+1 architectures) – Leased telco lines for customers 

– Dialup aggregaIon   

•  As CORE routers evolved the old migrated to support edge connects 

•  Telco becomes a client of the IP network 

Page 5: A New Soware Architecture for Core Internet Routers · 2015. 8. 23. · Core Routers are built as Clusters but act as a single virtual machine • Mulple line cards with potenally

Moore’s law x2/18m

DRAM access rate x1.1/18m

Silicon speed x1.5/18m

Router Capacity x2.9/18m

The demand for increased network system performance/scale is 

relentless...  

0

200

400

600

800

1000

1200

2004 2006 2008 2010 2012 2014

Internet traffic “2x/year” 

10 

100 

1000 

10000 

Growth driven by increased user demand 

Page 6: A New Soware Architecture for Core Internet Routers · 2015. 8. 23. · Core Routers are built as Clusters but act as a single virtual machine • Mulple line cards with potenally

System Scaling Problems 

Page 7: A New Soware Architecture for Core Internet Routers · 2015. 8. 23. · Core Routers are built as Clusters but act as a single virtual machine • Mulple line cards with potenally
Page 8: A New Soware Architecture for Core Internet Routers · 2015. 8. 23. · Core Routers are built as Clusters but act as a single virtual machine • Mulple line cards with potenally

Some of the reasons SW problems were encountered 

•  Routers started as Ightly coupled embedded systems –  speeds and feeds were the game with features 

•  CPUs + NPUs + very aware programmers led the game 

•  EvoluIon was very fast –  Business customers 

•  leased lines and frame relay –  Mid 1990s 64kbit dialup starts    –  Core bandwidth doubling every year 

•  As IP customer populaIons grew feature demands increased 

•  Model of SW delivery not conducive to resilience of rapid feature deployment 

Page 9: A New Soware Architecture for Core Internet Routers · 2015. 8. 23. · Core Routers are built as Clusters but act as a single virtual machine • Mulple line cards with potenally

Intent /Goals 

–   build an applicaIon unaware fault tolerant distributed system for routers 

–  always on(200msec failover of apps) 

–  allow for inserIon of new features with no impact to exisIng operaIons 

–   support +/‐ 1 versioning of key applicaIons with zero packet loss 

–  versioning to allow for live feature tesIng 

Page 10: A New Soware Architecture for Core Internet Routers · 2015. 8. 23. · Core Routers are built as Clusters but act as a single virtual machine • Mulple line cards with potenally

Fault Tolerant RouIng 

Page 11: A New Soware Architecture for Core Internet Routers · 2015. 8. 23. · Core Routers are built as Clusters but act as a single virtual machine • Mulple line cards with potenally

MoIvaIons 

•  We must be able to do be]er than 1+1 –  Low confidence in 1+1 as only tested when actually upgrading/downgrading/crashing 

•  Want 100% confidence in new code – Despite lab Ime, rollout o(en uncovers showstoppers –  Rollback can be very disrupIve 

•  Aiming for sub‐200ms ‘outages’ – Want to be able to recover before VOIP calls noIce 

Page 12: A New Soware Architecture for Core Internet Routers · 2015. 8. 23. · Core Routers are built as Clusters but act as a single virtual machine • Mulple line cards with potenally

Core Routers are built as Clusters but act as a single virtual machine 

•  MulIple line cards with potenIally various types of interfaces use NPUs  to route/switch amongst themselves via a data‐plane ( switch fabric ) 

•  A separate control plane controls all NPUs programming switching tables and managing interface state along, rouIng protocols along with environmental condiIons –  Control plane CPUs are typically generic and ride the commodity curve 

•  The Systems are heterogeneous and large –  Current Cisco CRS3 deployments switch 128tb, have ~150 x86 CPUs for the 

control plane along with ~1terabyte of memory and scale higher   

Page 13: A New Soware Architecture for Core Internet Routers · 2015. 8. 23. · Core Routers are built as Clusters but act as a single virtual machine • Mulple line cards with potenally

VirtualizaIon/VoIng/BGP 

•  BGP state is Ied to TCP connecIon state –  loopback interfaces 

•  Process Placement 

•  Versioning •  Leader elecIon •  HW virtualizaIon – e.g. NPU virtualizaIon??? 

Page 14: A New Soware Architecture for Core Internet Routers · 2015. 8. 23. · Core Routers are built as Clusters but act as a single virtual machine • Mulple line cards with potenally

Approach taken •  AbstracIon layers chosen to isolate applicaIons 

–  applicaIons ( e.g. protocols) isolated with wrappers   •  applicaIon transparent check poinIng!!!! 

•  FTSS used to store state •  SHIM used as wrapper   

–  model to allow for voIng •  OpImize, opImize, opImize  

–  experiment and prototype •  ORCM used for process placement •  Protocols isolated by a shim layer 

–  mulIple versions called siblings  •  2 levels of operaIon chosen 

–  no use seen for hypervisor –  user mode for apps; kernel;  abstracIon layer via SHIM + FTSS 

Page 15: A New Soware Architecture for Core Internet Routers · 2015. 8. 23. · Core Routers are built as Clusters but act as a single virtual machine • Mulple line cards with potenally

Protocol VirtualizaIon 

•  ExisIng protocol code largely untouched •  Can run N siblings 

–  Can be different versions –  the protocol being virtualized –  Allows full tesIng of new code – with seamless switchover and switch back 

•  Currently we run one virtualizaIon wrapper –  Protected by storing state into FTSS –  Can be restarted thus upgradeable –  Designed to know as li]le about protocol as possible 

•  Treats most of it as a ‘bag of bits’ 

•  ‘Run anywhere’ – no RP/LC assumpIons – We don’t care what you call the compute resources 

Page 16: A New Soware Architecture for Core Internet Routers · 2015. 8. 23. · Core Routers are built as Clusters but act as a single virtual machine • Mulple line cards with potenally

CRS uIlisaIon 

RP RP 

LC LC  LC LC 

• The CRS contains many CPUs which we treat as compute nodes in a cluster • If a node fails the others take up its workload • No data is lost on a failure, and the so(ware adapts to re‐establish redundancy 

Page 17: A New Soware Architecture for Core Internet Routers · 2015. 8. 23. · Core Routers are built as Clusters but act as a single virtual machine • Mulple line cards with potenally

CRS uIlisaIon 

RP RP 

LC LC  LC LC 

• External resources can be added to the system to add redundancy or compute power 

Blade server 

Page 18: A New Soware Architecture for Core Internet Routers · 2015. 8. 23. · Core Routers are built as Clusters but act as a single virtual machine • Mulple line cards with potenally

Placement of Components 

•  Each compute node runs FTSS and ORCM – both are started by ‘qn’ (system process monitor) 

•  FTSS stores rouIng data redundantly across all the systems in the router 

•  ORCM manages rouIng processes and distributes them around the router – constraints can be applied via configuraIon 

•  FTSS can run on other nodes to make use of memory if desired. 

RP 

LC 

FTSS 

ORCM 

FTSS 

ORCM 

Page 19: A New Soware Architecture for Core Internet Routers · 2015. 8. 23. · Core Routers are built as Clusters but act as a single virtual machine • Mulple line cards with potenally

BGP VirtualisaIon 

FTSS 

ORCM 

Distributed dataplane 

RIB 

BGP VirtualisaIon service (shim) 

Reliable TCP endpoint 

BGP 

BGP BGP 

BGP BGP new BGP 

Page 20: A New Soware Architecture for Core Internet Routers · 2015. 8. 23. · Core Routers are built as Clusters but act as a single virtual machine • Mulple line cards with potenally

VirtualisaIon Layer recovery 

FTSS 

ORCM 

Distributed dataplane 

RIB 

BGP VirtualisaIon service (shim) 

Reliable TCP endpoint 

BGP 

BGP 

BGP 

New shim 

Page 21: A New Soware Architecture for Core Internet Routers · 2015. 8. 23. · Core Routers are built as Clusters but act as a single virtual machine • Mulple line cards with potenally

IS‐IS VirtualisaIon 

ORCM 

Distributed dataplane 

RIB 

IS‐IS VirtualisaIon service (shim) 

IS‐IS L2 receiver 

IS‐IS 

IS‐IS 

IS‐IS 

Page 22: A New Soware Architecture for Core Internet Routers · 2015. 8. 23. · Core Routers are built as Clusters but act as a single virtual machine • Mulple line cards with potenally

Fault Tolerant State Storage 

•  Distributed Hash Table with intelligent placement of data 

•  You can decide how much replicaIon – 2,3,4,N copies. 

•  More copies ‐ more memory & slower write Imes. 

•  Fewer copies – less simultaneous failures  

•  Virtual Nodes – able to balance memory usage to space on compute node 

Page 23: A New Soware Architecture for Core Internet Routers · 2015. 8. 23. · Core Routers are built as Clusters but act as a single virtual machine • Mulple line cards with potenally

FTSS distributed storage 

FTSS RP0 

FTSS RP1 

FTSS LC0 FTSS LC1 

FTSS LC2 

Some data – stored redundantly in 2 places 

Page 24: A New Soware Architecture for Core Internet Routers · 2015. 8. 23. · Core Routers are built as Clusters but act as a single virtual machine • Mulple line cards with potenally

FTSS: losing a node 

FTSS 

FTSS 

FTSS FTSS 

FTSS 

Data missing is replicated from predecessor 

Page 25: A New Soware Architecture for Core Internet Routers · 2015. 8. 23. · Core Routers are built as Clusters but act as a single virtual machine • Mulple line cards with potenally

Key 

•  Binary data •  Unique in DHT 

Value 

•  Binary data 

Link 

•  Unique set of binary data items 

•  OpImizaIons for use as a list of keys 

DHT tuples 

DHT provides opImised rouInes for: • fast parallel store and deleIon of mulIple tuples • fast update of mulIple links within a tuple • OperaIons directly using the link list for storing related data • fast parallel recovery of mulIple, possibly inter‐linked, KVL tuples 

Copies of the tuples are stored on mulIple nodes for redundancy 

Page 26: A New Soware Architecture for Core Internet Routers · 2015. 8. 23. · Core Routers are built as Clusters but act as a single virtual machine • Mulple line cards with potenally

DHT use in BGP processing 

Receive incoming BGP messages 

Acknowledge TCP 

Create minimal 

message set 

Hand to BGP siblings; routes 

produced 

Pass routes from lead 

sibling to RIB 

Unprocessed messages 

RIB prefixes A]ributes + 

NLRI 

Early redundant store to permit fast 

acknowledgement of incoming BGP messages 

Minimal set of incoming BGP 

data 

Data store for re‐syncing with RIBs on restart 

BGP Shim operaIons 

DHT 

Page 27: A New Soware Architecture for Core Internet Routers · 2015. 8. 23. · Core Routers are built as Clusters but act as a single virtual machine • Mulple line cards with potenally

BGP data in DHT (I) 

126 

127 

128 

.. 

10.0.0.4 

192.168.22.5 

4.1.0.77 

.. 

ASPATH 1 + a]rs 

ASPATH 2 + a]rs 

ASPATH 3 + a]rs 

.. 

NLRI + peer 1 

NLRI + peer 2 

NLRI + peer 3 

.. 

Unprocessed incoming messages 

Source peers Announcements from peers, minimal set 

Data within links 

Tuples  Tuples 

Page 28: A New Soware Architecture for Core Internet Routers · 2015. 8. 23. · Core Routers are built as Clusters but act as a single virtual machine • Mulple line cards with potenally

BGP data in DHT (II) 

.. 

10.0.0.4 

19.1.22.5 

4.1.0.77 

.. 

Siblings  RIB prefixes 

Links  Tuples 

Page 29: A New Soware Architecture for Core Internet Routers · 2015. 8. 23. · Core Routers are built as Clusters but act as a single virtual machine • Mulple line cards with potenally

DHT use in IS‐IS processing 

Receive incoming IS‐IS 

frames 

Create minimal 

message set 

Hand to IS‐IS siblings; routes 

produced 

Pass routes from lead 

sibling to RIB 

RIB prefixes LSPs 

Minimal set of incoming IS‐IS 

frames 

Data store for resyncing with RIBs on restart 

IS‐IS Shim operaIons 

DHT 

Page 30: A New Soware Architecture for Core Internet Routers · 2015. 8. 23. · Core Routers are built as Clusters but act as a single virtual machine • Mulple line cards with potenally

MulIpath IGP/EGP demo 

Page 31: A New Soware Architecture for Core Internet Routers · 2015. 8. 23. · Core Routers are built as Clusters but act as a single virtual machine • Mulple line cards with potenally