Navaid Shamsee Network Architecture

download Navaid Shamsee Network Architecture

of 73

Transcript of Navaid Shamsee Network Architecture

  • 8/10/2019 Navaid Shamsee Network Architecture

    1/73

    1

    Data CenterNetworking

    Architecture and

    Design Guidelines

  • 8/10/2019 Navaid Shamsee Network Architecture

    2/73

  • 8/10/2019 Navaid Shamsee Network Architecture

    3/73

    3

    Data Center Architecture Overview

    Layers of the Data Center Multi-Tier Model

    Layer 2 and layer 3 accesstopologies

    Dual and single attached 1RU andblade servers

    Multiple aggregation modules

    Web/app/database multi-tier

    environments

    L2 adjacency requirements

    Mix of over-subscriptionrequirements

    Environmental implications

    Stateful services for securityand load balancing

    Blade Chassisw/ Integrated

    Switch

    L3 Access

    Blade Chassisw/ Pass Thru

    Mainframew/ OSA

    L2 w/

    Clustering andNIC Teaming

    Network

    DC Aggregation

    DC Access

    DC Core

  • 8/10/2019 Navaid Shamsee Network Architecture

    4/73

    4

    Core Layer Design

  • 8/10/2019 Navaid Shamsee Network Architecture

    5/73

    5

    Core Layer Design

    Requirements

    Is a separate DC CoreLayer required?

    Consider:10GigE port density

    Administrative domains

    Anticipate future requirements

    Key core characteristics

    Distributed forwardingarchitecture

    Low latency switching

    10GE scalabilityAdvanced ECMP

    Advanced hashing algorithms

    Scalable IP multicast support

    Scaling

    Campus Core

    DC Core

    Aggregation

    CampusDistribution

    Campus Access Layer

    Server Farm Access Layer

    Scaling

  • 8/10/2019 Navaid Shamsee Network Architecture

    6/73

    6

    Core Layer Design

    L2/L3 Characteristics

    Layer 2/3 boundaries:

    All Layer 3 links at core,

    L2 contained at/belowaggregation module

    L2 extension through core isnot recommended

    CEF hashing algorithm

    Default hash is on L3 IPaddresses only

    L3 + L4 port is optional andmay improve load distribution

    CORE1(config)#mls ip cefload full simple

    Leverages automatic sourceport randomization in clientTCP stack

    Campus Core

    DC Core

    Aggregation

    Web

    Servers

    Application

    Servers

    Database

    Servers

    Access

    L3

    L2

    CEF Hash

    Applied to

    Packets on

    Equal Cost

    Routes

  • 8/10/2019 Navaid Shamsee Network Architecture

    7/73

    7

    Core Layer Design

    Routing Protocol Design: OSPF

    NSSA helps to limit LSA propagation,but permits route redistribution (RHI)

    Advertise default into NSSA,summarize routes out

    OSPF default reference b/w is 100M,use auto-cost reference-bandwidthset to 10G value

    VLANs on 10GE trunks have OSPFcost = 1G (cost 1000), adjustbandwidth value to reflect 10GE forinterswitch L3 vlan

    Loopback interfaces simplifytroubleshooting (neighbor ID)

    Use passive-network default: open uponly links to allow

    Use authentication: more secure andavoids undesired adjacencies

    Timers spf 1/1, interface hello-dead

    = 1/3

    Campus Core

    DC Core

    Aggregation

    Web

    Servers

    Application

    Servers

    Database

    Servers

    Access

    Default Default

    DC Subnets

    (Summarized) L3 vlan-ospf

    Area 0

    NSSA

    L0=10.10.3.3 L0=10.10.4.4

    L0=10.10.1.1 L0=10.10.2.2

  • 8/10/2019 Navaid Shamsee Network Architecture

    8/73

    8

    Core Layer Design

    Routing Protocol Design: EIGRP

    Advertise default into DC withinterface command on core:

    If other default routes exist(from internet edge for

    example), may need to usedistribute lists to filter out

    Use passive-network default

    Summarize DC subnets to

    core with interface commandon agg:

    ip summary-address eigrp 20 0.0.0.0 0.0.0.0 200

    Cost of 200 required to be preferredroute over the NULL0 routeinstalled by EIGRP

    ip summary-address eigrp 20 10.20.0.0 255.255.0.0

    Campus Core

    DC Core

    Aggregation

    Web

    Servers

    Application

    Servers

    Database

    Servers

    Access

    DC Subnets(Summarized)

    L3 vlan-eigrp

    Default Default

  • 8/10/2019 Navaid Shamsee Network Architecture

    9/73

    9

    AggregationLayer Design

  • 8/10/2019 Navaid Shamsee Network Architecture

    10/73

    10

    Aggregation Layer Design

    Spanning Tree Design

    Rapid-PVST+ (802.1w) orMST (802.1s)

    Choice of .1w/.1s based on scaleof logical+virtual ports required

    R-PVST+ is recommended andbest replacement for 802.1d

    Fast converging: inherits proprietaryenhancements (Uplink-fast,Backbone-fast)

    Access layer uplink failures:~300ms 2sec

    Most flexible design options

    Combined with RootGuard,BPDUGuard, LoopGuard ,and UDLDachieves most stable STP environment

    UDLD global only enables on fiberports, must enable manually on

    copper ports

    Root Primary

    HSRP Primary

    Active Context

    Root Secondary

    HSRP SecondaryStandby Context

    Core

    Rootguard

    LoopGuard

    BPDU Guard

    UDLD Global

  • 8/10/2019 Navaid Shamsee Network Architecture

    11/73

    11

    Aggregation Layer Design

    Integrated Services

    L4-L7 services integrated in Cisco Catalyst 6500

    Server load balancing, firewall and SSL services may be

    deployed in:Active-standby pairs (ACE, FWSM 2.X)

    Active-active pairs (ACE, FWSM 3.1)

    Integrated blades optimize rack space, cabling, mgmt, providingflexibility and economies of scale

    Influences many aspects of overall design

    +

    Services: Firewall, Load Balancing,SSL Encryption/Decryption

  • 8/10/2019 Navaid Shamsee Network Architecture

    12/73

    12

    Aggregation Layer Design

    Active-Standby Service Design

    Active-standby services

    Application control engine

    Firewall service module (3.x)

    SSL module

    Under utilizes accesslayer uplinks

    Under utilizes service modulesand switch fabrics

    Multiple service module pairs

    does permit active-activedesign.

    Core

    Root Primary

    HSRP Primary

    Active Context

    Root Secondary

    HSRP Secondary

    Standby Context

  • 8/10/2019 Navaid Shamsee Network Architecture

    13/73

    13

    VLAN6:

    Root Primary

    HSRP Primary

    Active Context

    Aggregation Layer Design

    Active-Active Service Design

    Active-active Service Design

    Application control engine

    SLB+SSL+FW

    4G/8G/16G switch fabricoptions

    Active-standby distributionper context

    Firewall service module (3.x)

    Two active-standby groupspermit distribution ofcontexts across twoFWSMs (not per context)

    Permits uplink load balancingwhile having services applied

    Increases overall serviceperformance

    VLAN5:

    Root Primary

    HSRP Primary

    Active Context

    VLAN 6:

    Root Secondary

    HSRP Secondary

    Standby

    Context

    vlan6 vlan6

    VLAN 5:

    Root Secondary

    HSRP Secondary

    Standby Context

    vlan5 vlan6 vlan6 vlan5

    Core

  • 8/10/2019 Navaid Shamsee Network Architecture

    14/73

    14

    vlan6 vlan6

    vlan5 vlan6 vlan6 vlan5

    Core

    Aggregation Layer Design

    Establishing Path Preference

    Use route health injectionfeature of ACE

    Introduce route-map toRHI injected route to setdesired metric

    Aligns advertised route

    of VIP with active contexton ACE, FWSM and SSLservice modules

    Avoids unnecessary use

    of inter-switch link andasymmetrical flows

    3. Route Map on HostRoute Sets PreferredMetric of Routeroute-map RHI permit 10

    match ip address 44set metric-type type-1

    1. ACE Probes to

    Real Servers inVIP to DetermineHealth

    4. If Context FailoverOccurs, RHI andRoute PreferenceFollow

    2. If Healthy,Installs HostRoute to VIP onLocal MSFC

    Establish Route Preference for Service Enabled Applications

  • 8/10/2019 Navaid Shamsee Network Architecture

    15/73

    15

    Core and Aggregation Layer Design

    STP, HSRP and Service Context Alignment

    Align server access toprimary components in

    aggregation layer:Vlan root

    Primary HSRP instance

    Active service context

    Provides morepredictable design

    Simplifies troubleshooting

    More efficient traffic flowDecreases chance of flow ping-pong across inter-switch link

    Root Primary

    HSRP Primary

    Active Context

    Core

    Root Secondary

    HSRP Secondary

    Standby Context

  • 8/10/2019 Navaid Shamsee Network Architecture

    16/73

    16

    Aggregation Layer Design

    Scaling the Aggregation Layer

    Aggregation modules provide:

    Spanning tree scaling

    HSRP Scaling

    Access layer density

    10GE/GEC uplinks

    Application services scaling

    SLB/firewallFault domain sizing

    Core layer provides inter-aggmodule transport:

    Low latency distributed forwardingarchitecture (use DFCs)

    100,000s PPS forwarding rate

    Provides inter-agg moduletransport medium in

    multi-tier model

    Campus Core

    DC Core

    Server Farm Access Layer

    Scaling

    AggregationModule 1

    AggregationModule 2

    AggregationModule 3

  • 8/10/2019 Navaid Shamsee Network Architecture

    17/73

    17

    Agg2Agg1

    VRF-Green

    VRF-Blue

    VRF-Red

    802.1QTrunks

    VLANs IsolateContexts on Access

    Alternate Primary

    Contexts onAgg1 and 2 to

    Achieve Active-Active Design

    Firewall and SLBContexts for Green,

    Blue, and Red

    MPLS orOther Core

    DC Core

    Aggregation Layer Design

    Using VRFs in the DC (1)

    Enables virtualization/partitioning of networkresources (MSFC,ACE, FWSM)

    Permits use ofapplication serviceswith multiple access

    topologies

    Maps well to pathisolation MAN/WANdesigns such aswith MPLS

    Security policymanagement anddeployment by usergroup/vrf

  • 8/10/2019 Navaid Shamsee Network Architecture

    18/73

    18

    Aggregation Layer Design

    Using VRFs in the DC (2)

    Maps the virtualizedDC to the MPLS

    MAN/WAN cloud

    PE PE

    WAN/BranchCampus

    Blue VRFGreen VRF Blue VRFGreen VRF

    Core: P-Nodes

    Red VRF Red VRF

    Agg Module 1DC Core

    802.1QTrunks

    Agg Module 2

    802.1QTrunks

  • 8/10/2019 Navaid Shamsee Network Architecture

    19/73

    19

    Access LayerDesign

  • 8/10/2019 Navaid Shamsee Network Architecture

    20/73

    20

    Access Layer Design

    Defining Layer 2 Access

    L3 routing is first performedin the aggregation layer

    L2 topologies consist oflooped and loop-free models

    Stateful services at Agg canbe provided across the L2

    access (FW, SLB, SSL, etc.) VLANs are extended across

    inter-switch link trunk forlooped access

    VLANs are not extendedacross inter-switch link trunkfor loop-free access

    L3 Agg2

    DC Core

    Primary RootPrimary HSRP

    Active Services

    Secondary RootSecondary HSRPStandby Services

    L2

    Agg1Inter-Switch Link

  • 8/10/2019 Navaid Shamsee Network Architecture

    21/73

    21

    L3 Agg2

    DC Core

    Primary RootPrimary HSRP

    Active Services

    L2

    Agg1

    Access Layer Design

    Establish a Deterministic Model

    Align active componentsin traffic path on common

    Agg switch:Primary STP root

    Aggregation-1(config)#spanning-tree

    vlan 1-10 root primary

    Primary HSRP (outbound)standby 1 priority X

    Active service modules/contexts

    Path preference (inbound)

    Use RHI & Route mapRoute-map preferred-path

    match ip address x.x.x.x

    set metric -30

    (also see slide 15 )

    PathPref

    L3+L4 Hash

    Def gwy

  • 8/10/2019 Navaid Shamsee Network Architecture

    22/73

    22

    Access Layer Design

    Balancing VLANs on Uplinks: L2 Looped Access

    Distributes traffic loadacross uplinks

    STP blocks uplink path for VLANsto secondary root switch

    If active/standby servicemodules are used; considerinter-switch link utilization to

    reach active instanceMultiple service modules may bedistributed on both agg switches toachieve balance

    Consider b/w of inter-switch link in afailure situation

    If active/active service modulesare used;

    (ACE and FWSM3.1)

    Balance contexts+hsrp acrossagg switches

    Consider establishing inboundpath preference

    802.1q Trunks

    Blue-Primary RootBlue-Primary HSRP

    Red-Sec RootRed-Sec HSRP

    Red-Primary RootRed-Primary HSRPBlue-Sec RootBlue-Sec HSRP

    L3 Agg2

    L2

    Agg1

    L3+L4 HashDC Core

    Def gwy Def gwy

  • 8/10/2019 Navaid Shamsee Network Architecture

    23/73

    23

    Access LayerDesign

    L2 Looped Design Model

  • 8/10/2019 Navaid Shamsee Network Architecture

    24/73

    24

    F

    Access Layer Design

    Looped Design Model

    VLANs are extended betweenaggregation switches, creatingthe looped topology

    Spanning tree is used to preventactual loops (rapid PVST+, MST)

    Redundant path exists through asecond path that is blocking

    Two looped topology designs:

    Triangle and square

    VLANs may be load balancedacross access layer uplinks

    Inter-switch link utilization mustbe considered as this may beused to reach active services

    Primary STP Root

    Primary HSRP

    Active Services

    Secondary STP Root

    Secondary HSRP

    Standby Services

    FF F F

    F B F F FBB

    F F

    .1Q TrunkF F

    Looped Triangle Looped Square

    L3

    L2Inter-Switch Link

  • 8/10/2019 Navaid Shamsee Network Architecture

    25/73

    25

    Access Layer Design

    L2 Looped Topologies

    Looped Triangle Access

    Supports VLAN extension/L2 adjacency acrossaccess layer

    Resiliency achieved with dual homing and STP

    Quick convergence with 802.1W/S

    Supports stateful services at aggregation layer

    Proven and widely used

    Looped Square Access

    Supports VLAN extension/L2 adjacency acrossaccess layer

    Resiliency achieved with dual homing and STP

    Quick convergence with 802.1W/S

    Supports stateful services at aggregation layer

    Active-active uplinks align well to ACE/FWSM 3.1

    Achieves higher density access layer, optimizing

    10GE aggregation layer density

    L3L2

    L2 Looped Triangle

    VLAN10 VLAN20

    VLAN10

    Row 2, Cabinet 3 Row 9, Cabinet 8

    L2 Looped Square

    L3L2

    VLAN10

    VLAN20

    VLAN10

    Row 2, Cabinet 3 Row 9, Cabinet 8

  • 8/10/2019 Navaid Shamsee Network Architecture

    26/73

    26

    Access Layer Design

    Benefits of L2 Looped Design

    Services like firewall andload balancing can easilybe deployed at theaggregation layer andshared across multipleaccess layer switches

    VLANs are primarily

    contained between pairsof access switches but

    VLANs may be extendedto different accessswitches to support

    NIC teaming

    Clustering L2 adjacency

    Administrative reasons

    Geographical challenges Row A, Cab 7 Row K, Cab 3

    L3

    L2

    DC Core

  • 8/10/2019 Navaid Shamsee Network Architecture

    27/73

    27

    Access Layer Design

    Drawbacks of Layer 2 Looped Design

    Main drawback: if a loop occursthe network may becomeunmanageable due to the infinitereplication of frames

    802.1w Rapid PVST+ combinedwith STP related features and bestpractices improve stability and

    help to prevent loop conditionsUDLD

    Loopguard

    Rootguard

    BPDUguard

    Limited domain size

    Stay under STP watermarks for logicaland virtual ports

    3/2 3/2

    3/1 3/1

    Switch 1 Switch 2

    DST MAC 0000.0000.4444

    DST MAC 0000.0000.4444

    0000.0000.3333

  • 8/10/2019 Navaid Shamsee Network Architecture

    28/73

    28

    Access LayerDesign

    L2 LoopFree Design Model

  • 8/10/2019 Navaid Shamsee Network Architecture

    29/73

    29

    Access Layer Design

    Loop-Free Design

    Alternative to looped design

    2 Models: LoopFree U and

    LoopFree Inverted U Benefit: spanning tree is

    enabled but no port is blockingso all links are forwarding

    Benefit: less chance ofloop condition due tomisconfigurations or otheranomalies

    L2-L3 boundary varies

    by loop-free model used:U or Inverted-U

    Implication considerations withservice modules, L2 adjacency,

    and single attached servers

    LoopFree U LoopFree

    Inverted U

    L3

    L2

    DC Core

    L3

    L2

  • 8/10/2019 Navaid Shamsee Network Architecture

    30/73

    30

    Access Layer Design

    Loop-Free Topologies

    Loop-Free Inverted U Access

    Supports VLAN extension

    No STP blocking; all uplinks active

    Access switch uplink failure blackholes single attached servers

    Supports all service moduleimplementations

    Loop-Free U Access

    VLANs contained in switch pairs

    (no extension outside of switch pairs) No STP blocking; all uplinks active

    Autostate implications for certainservice modules (CSM)

    ACE supports autostate and per

    context failover

    L3

    L2

    VLAN10

    VLAN20

    L3L2

    VLAN10

    VLAN20

    VLAN10

    L2 Loop-Free U L2 Loop-Free Inverted U

  • 8/10/2019 Navaid Shamsee Network Architecture

    31/73

    31

    Access Layer Design

    Loop-Free U Design and Service Modules (1)

    If the uplink connecting access and aggregation goes down, the VLAN interface on

    the MSFC goes down as well due to the way autostate works CSM and FWSM has implications as autostate is not conveyed (leaving black hole)

    Tracking and monitoring features may be used to allow failover of service modulesbased on uplink failure but would you want a service module failover for one accessswitch uplink failure?

    Not recommended to use loop-free L2 access with active-standby service moduleimplementations. (See slide on ACE next)

    HSRP

    SVI for Interface vlan 10Goes to Down State ifUplink is the Only

    Interface in vlan 10

    MSFC on Agg1

    L3L2

    VLAN 20, 21

    Agg1, Active

    Services

    Agg2, Standby

    Services

    VLAN 10

    HSRP Secondary on

    Agg2 Takes Over as def-

    gwy, But How to Reach

    Active Service Modules?

    VLAN 11

  • 8/10/2019 Navaid Shamsee Network Architecture

    32/73

    32

    Access Layer Design

    Loop-Free U Design and Service Modules (2)

    With ACE:

    Per context failover with autostate

    If uplink fails to Agg1, ACE can switchover to Agg2 (under 1sec)

    Requires ACE on access trunk side for autostate failover

    May be combined with FWSM3.1 for active-active design

    HSRP

    L3L2

    VLAN 20, 21VLAN 10, 11

    HSRP Secondary on Agg2

    Takes Over as def-gwy

    ACE Context 1 ACE Context 1

  • 8/10/2019 Navaid Shamsee Network Architecture

    33/73

    33

    DC Core

    Access Layer Design

    Drawbacks of Loop-Free Inverted-U Design

    Single attached serversare black-holed if access

    switch uplink fails Distributed

    EtherChannel canreduce chance ofblack holing

    NIC teaming improvesresiliency in this design

    Inter-switch link scaling

    needs to be consideredwhen using active-standby service modules

    L3L2

  • 8/10/2019 Navaid Shamsee Network Architecture

    34/73

    34

    Access Layer Design

    Comparing Looped, Loop-Free

    (1,2)

    (3)

    (4)

    Uplink vlanson Agg

    Switch inBlocking or

    Standby

    State

    VLANExtensionSupportedAcrossAccess

    Layer

    ServiceModuleBlack-

    Holing onUplink

    Failure (5)

    SingleAttached

    ServerBlack-

    Holing onUplink

    Failure

    AccessSwitch

    Density perAgg Module

    MustConsider

    Inter-SwitchLink Scaling

    LoopedTriangle

    - + + + - +

    LoopedSquare

    + + + + + -

    Loop-Free U + - - + + +

    Loop-FreeInverted U

    + + + +/- + -(1,2)

    (3)

    (4)

    1. Use of distributed EtherChannel greatly reduces chances of black holing condition2. NIC teaming can eliminate black holing condition3. When service modules are used and active service modules are aligned to Agg14. ACE module permits L2 loopfree access with per context switchover on uplink failure

    5. Applies to when using CSM or FWSM in active/standby arrangement

  • 8/10/2019 Navaid Shamsee Network Architecture

    35/73

    35

    Access LayerDesign

    L3 Design Model

  • 8/10/2019 Navaid Shamsee Network Architecture

    36/73

    36

    DC Core

    DC Aggregation

    DC Access

    Access Layer Design

    Defining Layer 3 Access

    L3 access switches connectto aggregation with adedicated subnet

    L3 routing is first performed inthe access switch itself

    .1Q trunks between pairs of L3access switches support L2

    adjacency requirements (limitedto access switch pairs)

    All uplinks are active up toECMP maximum of 8, nospanning tree blocking occurs

    Convergence time is usuallybetter than spanning tree (rapidPVST+ is close)

    Provides isolation/shelter forhosts affected by broadcasts

    L3L2

    DC Core

  • 8/10/2019 Navaid Shamsee Network Architecture

    37/73

    37

    L3 Access withMulticast Sources

    L3 Access withMulticast Sources

    Access Layer Design

    Need L3 for Multicast Sources?

    Multicast sources on L2access works well when

    IGMP snooping is available IGMP snooping at access

    switch automatically limitsmulticast flow to interfaces

    with registered clientsin VLAN

    Use L3 when IGMPsnooping is not availableor when particular L3administrative functionsare required

    DC Core

    DC Aggregation

    DC AccessL3L2

    DC Core

  • 8/10/2019 Navaid Shamsee Network Architecture

    38/73

    38

    Access Layer Design

    Benefits of L3 Access

    Minimizes broadcast domainsattaining high level of stability

    Meet server stabilityrequirements or isolateparticular applicationenvironments

    Creates smaller failuredomains, increasing stability

    All uplinks are availablepaths, no blocking (up to

    ECMP maximum)

    Fast uplink convergence:failover and fallback, noarp table to rebuild for agg

    switches

    DC Core

    DC Aggregation

    DC AccessL3L2

    DC Core

  • 8/10/2019 Navaid Shamsee Network Architecture

    39/73

    39

    Access Layer DesignDrawbacks of Layer 3 Design

    L2 adjacency is limited toaccess pairs (clustering andNIC teaming limited)

    IP address spacemanagement is more difficultthan L2 access

    If migrating to L3 access,IP address changes maybe difficult on servers(may break apps)

    Would normally require

    services to be deployed ateach access layer pair tomaintain L2 adjacency withserver and provide statefulfailover

    DC Core

    DC Aggregation

    DC AccessL3L2

    DC Core

  • 8/10/2019 Navaid Shamsee Network Architecture

    40/73

    40

    Access Layer DesignL2 or L3? What Are My Requirements?

    Difficulties in managing loops Staff skillset; time to resolution

    Convergence properties

    NIC teaming; adjacency

    HA clustering; adjacency

    Ability to extend VLANs Specific application requirements

    Broadcast domain sizing

    Oversubscription requirements

    Link utilization on uplinks

    Service module support/placement

    RapidPVST+or

    MST

    OSPF

    ,EIGRP

    Aggregation

    Access

    Layer 2 Layer 3

    The Choice of One Design Versus the Other One Has to Do With:

    L3L2

    L3L2

    Aggregation

    Access

  • 8/10/2019 Navaid Shamsee Network Architecture

    41/73

    41

    Access LayerDesign

    BladeServers

    Bl d S R i t

  • 8/10/2019 Navaid Shamsee Network Architecture

    42/73

    42

    Blade Server RequirementsConnectivity Options

    Using Pass-Through Modules Using Integrated Ethernet Switches

    Blade Server Chassis Blade Server Chassis

    Aggregation

    Layer

    Interface 1

    Interface 2

    External L2Switches

    Integrated L2Switches

    Bl d S R i t

  • 8/10/2019 Navaid Shamsee Network Architecture

    43/73

    43

    Blade Server RequirementsConnectivity Options

    Layer 3 access tier may beused to aggregate bladeserver uplinks

    Permits using 10GE uplinksinto agg layer

    Avoid dual tier Layer 2access designs

    STP blocking

    Over-subscription

    Larger failure domain

    Consider Trunk Failoverfeature of integrated switch

    AggregationLayer

    Integrated L2Switches

    Layer 3Access

    L3L2

    10GE

    L3

    L2

    Layer 2Access

    Bl d S R i t

  • 8/10/2019 Navaid Shamsee Network Architecture

    44/73

    44

    Blade Server RequirementsTrunk Failover Feature

    Switch takes down serverinterfaces if correspondinguplink fails, forcing NICteaming failover

    Solves NIC teaminglimitations; prevents black-holing of traffic

    Achieves maximumbandwidth utilization:

    No blocking by STP, but STPis enabled for loop protection

    Can distribute trunk failovergroups across switches

    Dependent upon the NICfeature set for NIC

    Teaming/failover

    Interface 1

    Interface 2

    Blade Server Chassis

    Aggregation

    Integrated L2Switches

  • 8/10/2019 Navaid Shamsee Network Architecture

    45/73

    45

    Density andScalabilityImplications in

    the Data Center

  • 8/10/2019 Navaid Shamsee Network Architecture

    46/73

    Density and Scalability Implications

  • 8/10/2019 Navaid Shamsee Network Architecture

    47/73

    47

    Density and Scalability ImplicationsServer Farm Cabinet Layout

    ~30-401RUServersperRack

    N- Racks per Row

    Considerations:How Many Interfaces per ServerTop of Rack Switch placement or

    End of Row/Within the RowSeparate Switch for OOB NetworkCabling overhead vs. under floorPatch systemsCooling capacityPower distribution/ Redundancy

    Density and Scalability Implications

  • 8/10/2019 Navaid Shamsee Network Architecture

    48/73

    48

    Density and Scalability ImplicationsCabinet Design with 1RU Switching

    Minimizes the number of cables torun from each cabinet/rack

    If NIC teaming support: two -1RUswitches are required

    Will two 1RU switches provideenough port density?

    Cooling requirements usually donot permit a full rack of servers

    Redundant switch power supplyare option

    Redundant switch CPUconsiderations: not an option

    GEC or 10GE uplinkconsiderations

    Single Rack-2 Switches Dual Rack-2 Switches

    Cablingre

    mainsincabine

    ts

    Servers Connect Directly to a 1RU Switch

    Density and Scalability Implications

  • 8/10/2019 Navaid Shamsee Network Architecture

    49/73

    49

    Density and Scalability ImplicationsNetwork Topology with 1RU Switching Model

    Pro: Efficient CablingPro: Improved CoolingCon: Number of Devices/Mgmt

    Con: Spanning Tree Load

    Access

    Aggregation

    Cabinet 1 Cabinet 2

    With ~1,000 Servers/25 Cabinets = 50 Switches

    Cabinet 25

    4 Uplinks per Cabinet = 100

    Uplinks Total to Agg Layer

    GEC or 10GE Uplinks?

    DC Core

    Density and Scalability Implications

  • 8/10/2019 Navaid Shamsee Network Architecture

    50/73

    50

    Density and Scalability ImplicationsCabinet Design with Modular Access Switches

    Cable bulk at cabinet floorentry can be difficult tomanage and block cool air

    flow Typically spaced out by

    placement at ends of rowor within row

    Minimizes cabling to

    Aggregation

    Reduces number ofuplinks/aggregation ports

    Redundant switch powerand CPU are options

    GEC or 10GE UplinkConsiderations

    NEBS ConsiderationsCables Route Under Raised Floor or in Overhead Trays

    Servers Connect Directly to a Modular Switch

    Density and Scalability Implications

  • 8/10/2019 Navaid Shamsee Network Architecture

    51/73

    51

    Density and Scalability ImplicationsNetwork Topology with Modular Switches in the Access

    Access

    ~8 Access Switches to Manage

    Pro: Fewer Devices/MgmtCon: Cabling ChallengesCon: Cooling Challenges

    With ~1,000 Servers/9 Slot Access Switches= 8 Switches

    2 Uplinks per 6509 = 16

    Uplinks to Agg Layer

    Aggregation

    DC Core

    GEC or 10GE Uplinks?

    Density and Scalability Implications

  • 8/10/2019 Navaid Shamsee Network Architecture

    52/73

    52

    Density and Scalability Implications1RU and Modular Comparison

    Aggregation

    1RU

    Access

    ModularAccess

    Fewer Uplinks

    More Uplinks

    Fewer I/O Cables

    More I/O Cables

    Higher STP Proc

    Lower STP Proc

    Density and Scalability Implications

  • 8/10/2019 Navaid Shamsee Network Architecture

    53/73

    53

    Density and Scalability ImplicationsDensity: How Many NICs to Plan For?

    Three to four NICs per serverare common

    Front end or public interface

    Storage interface (GE, FC)

    Backup interface

    Back end or private interface

    integrated Lights Out (iLO)

    for OOB mgmt

    May require more than two1RU switches per rack

    30 servers@ 4 ports = 120 ports

    required in a single cabinet(3x48 port 1RU switches)

    May need hard limits oncabling capacity

    Avoid cross cabinet and othercabling nightmares

    Single Rack-2 Switches Dual Rack-2 Switches

    Front End Interface

    Back End Interface

    Backup Network

    OOBManagement

    Storage HBA orGE NIC

    Cablingre

    mainsincabine

    ts

    Density and Scalability Implications

  • 8/10/2019 Navaid Shamsee Network Architecture

    54/73

    54

    Density and Scalability ImplicationsHybrid Example with Separate OOB

    1RU end of row approach:

    Smaller failure domains

    Low power consumption (kw/hr)

    GEC or 10GE uplinks

    Permits server distribution

    via patchEliminates TOR port density issue

    Requires solid patch & cablingsystem

    Separate OOB switch:

    Usually 10M Ethernet (iLo)

    Very low utilization

    Doesnt require high performance

    Separate low end switch ok

    Isolated from production network

    Hybrid modular + 1RU

    Provides flexibility in design

    Dual CPU + power for critical apps

    Secondary port for NIC teaming

    OOBOOBOOB OOB

    Aggregation

    Density and Scalability Implications

  • 8/10/2019 Navaid Shamsee Network Architecture

    55/73

    55

    Density and Scalability ImplicationsOversubscription and Uplinks

    What is the oversubscription ratio peruplink?

    Develop an oversubscription reference model

    Identify by application, tier, or other means

    Considerations

    Future- true server capacity (PCI-X, PCI- Express)

    Server platform upgrade cycle will increaselevels of outbound traffic

    Uplink choices available

    Gigabit EtherChannel 10GE

    10Gig EtherChannel

    Flexibility in adjusting oversubscription ratio

    Can I upgrade to 10GE easily? 10GEtherChannel?

    Upgrade CPU and switch fabric? (sup1-2-720-?)

    Density and Scalability Implications

  • 8/10/2019 Navaid Shamsee Network Architecture

    56/73

    56

    Density and Scalability ImplicationsSpanning Tree

    1RU switching increase chances oflarger spanning tree diameter

    BladeServer switches are logicallysimilar to adding 1RU switches intothe access layer

    A higher number of trunks willincrease STP logical port counts inaggregation layer

    Determine spanning tree logical and

    virtual interfaces before extendingVLANs or adding trunks

    Use aggregation modules to scaleSTP and 10GE density

  • 8/10/2019 Navaid Shamsee Network Architecture

    57/73

    57

    Scaling Bandwidthand Density

    Scaling B/W with GEC and 10GE

  • 8/10/2019 Navaid Shamsee Network Architecture

    58/73

    58

    gOptimizing EtherChannel Utilization

    Ideal is graph on top right

    Bottom left graph more typical

    Analyze the traffic flows in and out ofthe server farm:

    IP addresses (how many?)

    L4 port numbers (randomized?)

    Default L3 hash may not be optimalfor GEC: L4 hash may improve

    10 GigE gives you effectively the fullbandwidth without hash implications

    agg(config)# port-channel load balance

    src-dst-port

    Scaling B/W with GEC and 10GE

  • 8/10/2019 Navaid Shamsee Network Architecture

    59/73

    59

    gUsing EtherChannel Min-Links

    Min-Links feature is availableas of 12.2.18SXF

    Set the minimum number ofmember ports that must be inthe link-up state or declare thelink down

    Permits higher bandwidthalternate paths to be used orlower bandwidth paths to beavoided

    Locally significant configuration Only supported on LACP

    EtherChannels

    Router# configure terminal

    Router(config)# interface port-channel 1Router(config-if)# port-channel min-links 2Router(config-if)# end

    Scaling B/W with GEC and 10GE

  • 8/10/2019 Navaid Shamsee Network Architecture

    60/73

    60

    gMigrating Access Layer Uplinks to 10GE

    How do I scale as I migrate from GEC to 10GE uplinks?

    How do I increase the 10GE port density at the agg layer?

    Is there a way to regain slots used by service modules?

    Access Pair 1

    Aggregation

    DC Core

    Scaling B/W with GEC and 10GE

  • 8/10/2019 Navaid Shamsee Network Architecture

    61/73

    61

    Service Layer Switch

    Move certain services out ofaggregation layer

    Ideal for ACE & FWSMmodules

    Opens slots in agg layer for10GE ports

    May need QOS or separatelinks for FT paths

    Extend only necessary L2VLANs to service switches

    via .1Q trunks (GEC/TenG)

    RHI installs route in localMSFC only, requiring L3peering with aggregation

    DC Core

    ServiceSwitch2

    (Redundant)

    ServiceSwitch1

    Access

    Aggregation

    Scaling B/W with GEC and 10GEC lid t t ACE

  • 8/10/2019 Navaid Shamsee Network Architecture

    62/73

    62

    Consolidate to ACE

    Consider consolidatingmultiple service modulesonto ACE Module

    SLB

    Firewall

    SSL

    4/8/16G Fabric

    Connected

    Active-Active Designs

    Higher CPS +Concurrent CPS

    Single TCP termination,lower latency

    Feature gap may notpermit till later release

    DC Core

    Access

    Aggregation

  • 8/10/2019 Navaid Shamsee Network Architecture

    63/73

    63

    Increasing HA inthe Data Center

    Increasing HA in the Data CenterS Hi h A il bilit

  • 8/10/2019 Navaid Shamsee Network Architecture

    64/73

    64

    Server High Availability

    1. Server network adapter

    2. Port on a multi-port

    server adapter3. Network media

    (server access)

    4. Network media (uplink)

    5. Access switch port

    6. Access switch module

    7. Access switch

    With Data Center HARecommendations

    Without Data Center HARecommendations

    L2

    L3

    These Network Failure Issues Can BeAddressed by Deployment of DualAttached Servers Using Network AdapterTeaming Software

    Common Points of Failure

    Increasing HA in the Data CenterC NIC T i C fi ti

  • 8/10/2019 Navaid Shamsee Network Architecture

    65/73

    65

    On Failover, Src MAC Eth1 = Src MAC Eth0IP Address Eth1 = IP Address Eth0

    Eth1: StandbyEth0: Active

    SFTSwitch Fault Tolerance

    IP=10.2.1.14

    MAC =0007.e910.ce0f

    On Failover, Src MAC Eth1 = Src MAC Eth0IP Address Eth1 = IP Address Eth0

    Eth1: StandbyEth0: Active

    AFTAdapter Fault Tolerance

    H

    eartbeats

    Heartbeats

    One Port Receives, All Ports TransmitIncorporates Fault Tolerance

    One IP Address and Multiple MAC Addresses

    Eth1-X: ActiveEth0: Active

    ALBAdaptive Load Balancing

    Heartbeats

    IP=10.2.1.14

    MAC =0007.e910.ce0f

    IP=10.2.1.14

    MAC =0007.e910.ce0f

    IP=10.2.1.14

    MAC =0007.e910.ce0e

    Default GW10.2.1.1

    HSRP

    Default GW10.2.1.1

    HSRP

    Default GW10.2.1.1

    HSRP

    Common NIC Teaming Configurations

    Note: NIC manufacturer drivers are changing and may operate differently.Also, server OS have started integrating NIC teaming drivers which may

    operate differently.

    Increasing HA in the Data CenterSer er Attachment M ltiple NICs

  • 8/10/2019 Navaid Shamsee Network Architecture

    66/73

    66

    Server Attachment: Multiple NICs

    L2

    L3

    EtherChannelsAll Links Active:

    Load Balancing

    Only One Link Active:

    Fault Tolerant Mode

    EtherChannel Hash MayNot Permit Full Utilizationfor Certain Applications

    (Backup Example)

    You Can Bundle Multiple Links to Allow GeneratingHigher Throughputs Between Servers and Clients

    Increasing HA in the Data CenterFailover: What Is the Time to Beat?

  • 8/10/2019 Navaid Shamsee Network Architecture

    67/73

    67

    Failover: What Is the Time to Beat?

    The overall failover time is thecombination of convergenceat L2, L3, + L4 components

    Stateful devices can replicateconnection information and typicallyfailover within 3-5sec

    EtherChannels < 1sec

    STP converges in ~1 sec (802.1w)

    HSRP can be tuned to

  • 8/10/2019 Navaid Shamsee Network Architecture

    68/73

    68

    Failover Time Comparison

    STP-802.1wOne sec

    OSPF-EIGRPOne sec

    ACE Module with Autostate

    HSRPThree sec (using 1/3)

    FWSM ModuleThree sec

    CSM ModuleFive sec

    WinXP/2003 ServerTCP StackNine sec

    OSPF/EIGRP

    Sub-Second

    Spanning Tree

    ~1sec

    HSRP

    ~ 3s

    (may betuned to less)

    Fa

    iloverTime TCP Stack

    Tolerance

    ~ 9s

    FireWallService

    Module

    ~ 3s

    Content

    Service

    Module

    ~ 5s

    ACE

    ~ 1s

    Increasing HA in the Data CenterNon Stop Forwarding/Stateful Switch Over

  • 8/10/2019 Navaid Shamsee Network Architecture

    69/73

    69

    Non-Stop Forwarding/Stateful Switch-Over

    NSF/SSO is a supervisorredundancy mechanism for intra-chassis supervisor failover

    SSO synchronizes layer 2 protocolstate, hardware L2/L3 tables(MAC, FIB, adjacency table), ACLand QoS tables

    SSO synchronizes state for:trunks, interfaces, EtherChannels,port security, SPAN/RSPAN, STP,UDLD, VTP

    NSF with EIGRP, OSPF, IS-IS,

    BGP makes it possible to have noroute flapping during the recovery

    Aggressive RP timers may notwork in NSF/SSO environment

    NSF-Aware NSF-Aware

    Increasing HA in the Data CenterNSF/SSO in the Data Center

  • 8/10/2019 Navaid Shamsee Network Architecture

    70/73

    70

    NSF/SSO in the Data Center

    SSO in the access layer:

    Improves availability for singleattached servers

    SSO in the aggregation layer:

    Consider in primary agg layer switch

    Avoids rebuild of arp, igp, stp tables

    Prevents service module switchover

    (can be up to ~6sec depending onwhich module)

    SSO switchover time less than two sec

    12.2.18SXD3 or higher

    Possible implicationsHSRP state between Agg switches isnot tracked and will show switchoveruntil control plane recovers

    IGP Timers cannot be aggressive(tradeoff)

    Increasing HA in the Data CenterBest Practices: STP HSRP Other

  • 8/10/2019 Navaid Shamsee Network Architecture

    71/73

    71

    Rapid PVST+

    UDLD Global

    Spanning Tree Pathcost Method=Long

    Agg1:

    STP Primary Root

    HSRP Primary

    HSRP Preempt and DelayDual Sup with NSF+SSO

    Agg2:

    STP Secondary Root

    HSRP Secondary

    HSRP Preempt and DelaySingle Sup

    Rapid PVST+: Maximum Number of STP Active Logical

    Ports- 8000 and Virtual Ports Per Linecard-1500

    Blade Chassiswith Integrated

    Switch

    FT

    Rootguard

    LoopGuardPortfast +BPDUguard

    LACP+L4 Hash

    Dist EtherChannel

    Min-Links

    L3+ L4 CEF Hash

    LACP+L4 Port HashDist EtherChannel for FT

    and Data VLANs

    Data

    Best Practices: STP, HSRP, Other

  • 8/10/2019 Navaid Shamsee Network Architecture

    72/73

    72

    Q and A

  • 8/10/2019 Navaid Shamsee Network Architecture

    73/73

    73