Scaling beyond 10G: When what you have is never enough…

48
Scaling beyond 10G: When what you have is never enough… Mike Hughes <[email protected]> CTO London Internet Exchange

description

Scaling beyond 10G: When what you have is never enough…. Mike Hughes CTO London Internet Exchange. Brief History of LINX. Founded in 1994 by 5 ISPs Pipex (the original “Pipex”, now MCI/Uunet) Demon Internet BTnet UKERNA EUnet GB (later PSInet, now Telstra UK) - PowerPoint PPT Presentation

Transcript of Scaling beyond 10G: When what you have is never enough…

Page 1: Scaling beyond 10G: When what you have is never enough…

Scaling beyond 10G:When what you have is never enough…

Mike Hughes <[email protected]>

CTO

London Internet Exchange

Page 2: Scaling beyond 10G: When what you have is never enough…

Brief History of LINX

Founded in 1994 by 5 ISPs– Pipex (the original “Pipex”, now MCI/Uunet)– Demon Internet– BTnet– UKERNA– EUnet GB (later PSInet, now Telstra UK)

A switch (well 10Mb hub!) in Telehouse– Volunteer staff

Page 3: Scaling beyond 10G: When what you have is never enough…

Architecture Development - 1996

A FDDI ring based architecture– Cisco and Plaintree

switches– FDDI, 100Mb TX

and 10Mb connections

Full time staff

Page 4: Scaling beyond 10G: When what you have is never enough…

Architecture Development - 1998

Gigabit Ethernet switches– First Metro GigE

deployment in EU Multiple site IX Multiple vendor

– Packet Engines– Extreme

Broke the 1G mark in Nov 1999

Page 5: Scaling beyond 10G: When what you have is never enough…

Cathartic Events in 2000

There was an attempt to take LINX commercial in the wake of the boom

Orchestrated by a number of LINX directors with external backing/funding

Member reaction – “LINX is not for sale!”– Concerns about LINX becoming open to capture

Reaffirmed the mutual, not-for-profit model

Page 6: Scaling beyond 10G: When what you have is never enough…

LINX Today

211 members from around 30 different countries– Still strong UK contingent (about 50%)– Most continents represented

7 co-locations in London Docklands Dual LAN, Dual Vendor nx10G network

– Foundry and Extreme platforms– Not interconnected– Both platforms/networks in each location

Page 7: Scaling beyond 10G: When what you have is never enough…

Meeting the 10G Challenge

LINX was a very early adopter of 10G– Foundry network first in late 2001

• It just worked!

– Removed the need to buy WDM equipment• Costly at the time

That’s been upgraded to nx10G in the backbone as traffic has grown

But now networks attaching to LINX at 10G– Presenting challenges for the backbone

Page 8: Scaling beyond 10G: When what you have is never enough…

10G Switches

Page 9: Scaling beyond 10G: When what you have is never enough…

Upgrade Process

We started upgrading our Foundry platform in 2004– BigIron MG8 switches– Not a trouble free experience– Now have 13 members connected via 10GE

Now upgrading the Extreme platform to an equivalent spec– And then upgrade Foundry again!

Page 10: Scaling beyond 10G: When what you have is never enough…

We love pain!

Two networks give us lots of extra redundancy and flexibility– Does mean we get to do things twice, though!

This year, LINX will upgrade the Extreme platform to be of an equivalent spec– Both networks need to be roughly equal

Test as much as possible, then test it again!– Can you be too thorough?

Agreed acceptance criteria with vendor– Especially for the first system

Page 11: Scaling beyond 10G: When what you have is never enough…

Interesting packet size datapoint

Packet Size Distribution at LINX

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

1024-1518

512-1023

256-511128-255

65-127

0-64

Page 12: Scaling beyond 10G: When what you have is never enough…

Vendor Selection: What Matters?

10G port density 1G port density Uniform, predictable packet performance

– Especially at smaller frame sizes!

Important features– Particularly trunking/LACP

High Availability– Hitless failover/upgrade, redundancy model

Page 13: Scaling beyond 10G: When what you have is never enough…

Challenges to come

Scaling the network for multiple 10G connections from members

Little sign of active development in 40G/100G arena– Meaning nx10G is best we can expect for now

Being able to provide uniform service in multiple locations

Potential for massive traffic growth…

Page 14: Scaling beyond 10G: When what you have is never enough…

Scary Doom Curve

Page 15: Scaling beyond 10G: When what you have is never enough…

Scarier – 3 months later

Page 16: Scaling beyond 10G: When what you have is never enough…

Where’s it all coming from?

Increased access speeds– ADSL2, WiMAX, FTTx, buzzword,

buzzword… More applications

– VoIP is a traffic red-herring – just watch the pps though!

Industry consolidation– Fewer people needing more & faster pipes

Page 17: Scaling beyond 10G: When what you have is never enough…

Technologies

The sky isn’t the ethernet limit– nx10G seems to be for the time being– 40G or 100G are some way off (3 years)

• According to most vendors

CWDM prices are falling Dark fibre is still relatively cheap May also be new technologies or ideas

on the horizon

Page 18: Scaling beyond 10G: When what you have is never enough…

What we can do today

Foundry Network Today: 20G across all nodes

XBlocked Link

Page 19: Scaling beyond 10G: When what you have is never enough…

…and tomorrow

Foundry Network Evolution 1: 2x20G rings

X

X

X= Blocked Links

Page 20: Scaling beyond 10G: When what you have is never enough…

…and next week

Foundry Network Evolution 2: 1x40G, 1x20G

XX= Blocked Links

X

Page 21: Scaling beyond 10G: When what you have is never enough…

…and next month

Foundry Network Evolution 2: 1x40G, 1x20G

XX= Blocked Links

X

Install Bigger

Switches!

Page 22: Scaling beyond 10G: When what you have is never enough…

Bigger Box: Foundry RX16

Double the density of the MG8 Up to 64 line-rate 10G ports per chassis

– Biggest on the market today– Keeps traffic inside a single large box

We’ve just finished lab testing

Page 23: Scaling beyond 10G: When what you have is never enough…

Shorter Term

Bigger switches and fatter Interswitch trunks can meet most needs– 10G connections have to be “concentrated”– But about 50% of a switch could easily be

consumed by backbone connectivity• With a consequent push to hierarchical model?

Need some protocol enhancements from vendors– e.g. EAPSv2 and MRP phase 2– add multiple ring support

Page 24: Scaling beyond 10G: When what you have is never enough…

Key Features at LINX

Moving to a “dual ring” topology– MRP Phase 2 on Foundry– EAPSv2 on Extreme

Allows different ring sizing– 40G ring on larger sites

Increases effective ISL bandwidth– less “transit” flows

Low(ish) dark fibre cost – no WDM here

Page 25: Scaling beyond 10G: When what you have is never enough…

Foundry Network Plans

THNRX16

THERX16

RBSMG8

RBXB8k

TCXB8k

TCMB8k

FOUNDRY NETWORK

THNB8k

MRP Ring 1: Blocked Link MRP Ring 2:

Blocked Link

MRP Ring 1:40G Ring

MRP Ring 2: 20G RingTCX-TCM-THN-RBS-RBX

MRP Ring 1 Master

MRP Ring 2 Master

100M Agg.

10G/1GMRP Shared Node

MRP Shared Node

Page 26: Scaling beyond 10G: When what you have is never enough…

Extreme Network Plans

THN

THE

RBS

RBX TCX

TCM

EXTREME NETWORK (Interim)

EAPS Ring 2: 20G RingTCX-TCM-THN-RBS-RBX

EAPS Ring 1:40G Ring

EAPS Ring 1 Master

EAPS Ring 2 Master

EAPS Ring 1: Blocked Link

EAPS Ring 2: Blocked Link

EAPSShared Node

EAPSShared Node

Page 27: Scaling beyond 10G: When what you have is never enough…

Fibre Network Expansion (1)

THN

THE

RBS

RBX TCX

TCM

4x New 9/125 Singlemode fibre

pairs.Routed diversely

from Prestons Road route:

2x Immediate

2x New 9/125 Singlemode fibre

pairs from Telehouse

2x New 9/125 Singlemode fibre pairs.Routed via Prestons Road route

Fibernet FibreTelehouse FibreThus FibreUnknown Supplier (New) Fibre

New FibreExisting Unlit FibreExisting Lit Fibre

FOUNDRY NETWORK

Page 28: Scaling beyond 10G: When what you have is never enough…

Fibre Network Expansion (2)

THN

THE

RBS

RBX TCX

TCM

4x New 9/125 Singlemode fibre

pairs.Routed diversely

from Prestons Road route:

June 2006

3x New 9/125 Singlemode fibre

pairs from Telehouse:

1x Immediate2x By June 2006 3x New 9/125

Singlemode fibre pairs.Routed via Prestons Road route:1x Immediate2x June 2006

Fibernet FibreTelehouse FibreThus FibreUnknown Supplier (New) Fibre

New FibreExisting Unlit FibreExisting Lit Fibre

EXTREME NETWORK

1x New 9/125 Singlemode fibre ring routing:

RBS-RBX-TCX-TCM-THN

Page 29: Scaling beyond 10G: When what you have is never enough…

So, what’s next?

At the last Seattle NANOG, a Force10 person came and asked:– “What do you want, 40G or 100G?”– The answer seemed to be 100G

We can do 40G now:– Expensively @ OC768– Cheaply @ 4x10GE

Therefore 40GE is a chocolate kettle– It’s a waste of devel time (and cash)

Who’s watching the core?

Page 30: Scaling beyond 10G: When what you have is never enough…

Hey, but can’t we just…

Build fat 8x10G link-agg? Rate limit/transfer cap users? Implement QoS? Throttle p2p apps? …well, yes, you could. But it either doesn’t scale, isn’t an

option, or is costly and complex.

Page 31: Scaling beyond 10G: When what you have is never enough…

It’s easier to overprovide…

“For a number of years, we seriously explored various “quality of service” schemes, including having our engineers convene a Quality of Service Working Group. Our research came to the conclusion that it was far more cost effective to simply provide more bandwidth. With enough bandwidth in the network, there is no congestion and video bits do not need preferential treatment.”

- Gary Bachula, VP Internet2

Page 32: Scaling beyond 10G: When what you have is never enough…

…with the right technology

We already need something faster than 10GE (and 40GE?).

Some networks already building 8x10GE link agg bundles on a single span!

Common engineering sense says that your backbone has to be some multiple larger than your largest customer connection.– A LINX member asked about ordering a 2x10G

port last week!

Page 33: Scaling beyond 10G: When what you have is never enough…

Looking Forward

Ethernet rings can have some problems– All nodes have to be (roughly) equal– Multiple rings solves most of this– Still constrained by max link speed/trunk size

Is the Swedish model - unconnected switches – a better way?– Backplane bandwidth is unrestricted/cheap– Some redundancy/resiliency challenges

Page 34: Scaling beyond 10G: When what you have is never enough…

How the Swedes do it

Enabled by the fibre situation in Stockholm– City run fibre utility/monopoly

Therefore fibre is readily available Two disconnected switches in different

locations– You get two pairs of fibre when you connect– One to each switch, in secure underground “cave”

Everything contained in the backplane

Page 35: Scaling beyond 10G: When what you have is never enough…

Traffic Management

MPLS– The DIX-IE (Tokyo) is involved in a trial of an

MPLS interconnect – using conventional routing (ISIS) to route the network and LDP to discover endpoints – “mplsASSOCIO”

– Downside is potentially complex config TRILL (nothing to do with Star Trek)

– IETF working group to support “L2 routing”– “rbridge”: ISIS for Layer 2, using MAC addresses– Would solve “wasted” redundant bandwidth

Page 36: Scaling beyond 10G: When what you have is never enough…

What’s going where?

The challenge with a flat L2 network– Just big broadcast domain(s)

Is it easier to take bulk flows and give a dedicated channel?

How to identify these flows?– ISP can do it (Netflow)– The IXP/MAN can do it (Sflow)

Page 37: Scaling beyond 10G: When what you have is never enough…

Sflow @ 10G

It’s sampled but still a hell of a lot of data– Sample rate @ 1 in 2048 packets– Gives about 60GB per day– Need 850G disk to deal with 2 weeks data– If traffic doubles in the year, need 1.7TB

Actually become constrained by disk I/O But we’re still deploying it anyway…

Page 38: Scaling beyond 10G: When what you have is never enough…

Other Scalers

Passive Private Interconnect– Fibre cross-connects to shed the largest flows– Cheap (for the IX), easy to implement– Can run whatever protocol the peers choose

More exchanges– Could LINX run a third platform?– More smaller exchanges? What about critical

mass? “Transmission Only”

– e.g. WDM platforms, stub-sites (no switch)

Page 39: Scaling beyond 10G: When what you have is never enough…

Move to “Stub” Nodes

Reduce core nodes down to small number of switches– Minimise interswitch connectivity

Stub nodes:– Cheap switch for 100M aggregation– CWDM terminal for GigE/10G transport

All traffic then hauled to centre– Pseudo-Swedish with “edges”

Page 40: Scaling beyond 10G: When what you have is never enough…

“Stub” overview

NETWORKSFOUNDRY BigIron MG8 NETWORKS

FOUNDRY BigIron MG8

NETWORKSFOUNDRY BigIron MG8

DWDM Terminal

DWDM Terminal

AGG Switch

100M conns GigE conns

Page 41: Scaling beyond 10G: When what you have is never enough…

Pros/Cons of Stubs

Pros– Easy to set up– Low commitment required– Relatively cheap per stub– May help break into new and “remote” locations

Cons– Less redundancy/resiliency– Finite (size of mux/aggr switch)– Hauls all traffic to core (even local 1G tfc)– Doesn’t fit ring topology of many fibre builds

Page 42: Scaling beyond 10G: When what you have is never enough…

Hierarchical Model

Core, Aggregation, Edge layers?– An expansion of “stubs”, really

More interswitch connectivity needed– Due to meshed topology

Simple ring topology no longer possible– May work for “core”, with edge “mesh”

Probably more expensive– More devices, increased management

Page 43: Scaling beyond 10G: When what you have is never enough…

Wrapping Up

Some vendors are saying that the next Ethernet standard is 5 years out. Too late!

While edge speed has increased, the core has stood still– Don’t edge and core vendors talk to each other?

Massive parallel links and “carving off” traffic is a tool in dealing with this– But adds complexity

Seems that keeping things simple remains key

Page 44: Scaling beyond 10G: When what you have is never enough…

Where are we now?

Page 45: Scaling beyond 10G: When what you have is never enough…

Where are we now?

Page 46: Scaling beyond 10G: When what you have is never enough…

Where are we now?

Page 47: Scaling beyond 10G: When what you have is never enough…

Where are we now?

Page 48: Scaling beyond 10G: When what you have is never enough…

Questions?