SDN Controller Challenges

39
SDN Controller Challenges

description

SDN Controller Challenges. The Story T hus Far. SDN --- centralize the network’s control plane The controller is effectively the brain of the network Controller determines what to do and tell switches how to do it. The Story Thus Far. The Story Thus Far. Something Happened!!!!. - PowerPoint PPT Presentation

Transcript of SDN Controller Challenges

Page 1: SDN Controller Challenges

SDN Controller Challenges

Page 2: SDN Controller Challenges

The Story Thus Far

• SDN --- centralize the network’s control plane– The controller is effectively the brain of the

network– Controller determines what to do and tell switches

how to do it.

Page 3: SDN Controller Challenges

The Story Thus Far

Page 4: SDN Controller Challenges

The Story Thus Far

Something Happened!!!!

Page 5: SDN Controller Challenges

The Story Thus Far

Let’s Ask the Brian!!!!

Page 6: SDN Controller Challenges

The Story Thus FarThink about what happen…Maybe come up with a solution

Page 7: SDN Controller Challenges

The Story Thus Far

• Controller runs control function• Control function creates switch state

– F(global network state) Switch state – Global network state can be graph of the network

Tell the network what to do

Page 8: SDN Controller Challenges

Challenges with Centralization

• Single point of failure– Fault tolerance

• Performance bottleneck– Scalability– Efficiency (switch-controller latency)

• Single point for security violations

Page 9: SDN Controller Challenges

Motivation for Distributed Controllers

• Wide-Area-Network– Wide distribution of switches: from USA to Australia.– High latency between one controller and All switches

• Application + Network growth– Higher CPU load for controller– More memory for storing FIB entries and calculations

• High availabilit

Page 10: SDN Controller Challenges

Class Outline

• Fault Tolerance– Google’s B4 paper

• Controller Scalability– Ways to scale the controller– Distributed controllers: Mesh Versus Hierarchy– Implications of controller placement

Page 11: SDN Controller Challenges

Fault Tolerance

Page 12: SDN Controller Challenges

Google’s B4 Network

• Provides connectivity between DC sites• Uses SDN to control edge switches• Goal: high utilization of links• Insight: fine-grained control over edge and

network can lead to higher utilization• Distributed Controllers– One set of controllers for each Data center (site)

Page 13: SDN Controller Challenges

Google’s B4 Network

• Provides connectivity between DC sites• Uses SDN to control edge switches• Goal: high utilization of links• Distributed Controllers– One set of controllers for each Data center (site)

Page 14: SDN Controller Challenges

Fault Tolerance in B4

• Each site runs a set of controller• Paxos is run between controllers in a site to

determine master

Page 15: SDN Controller Challenges

Quick Overview of Paxos• Given N controllers

– 1 Acts as leader, and N-1 as workers– All N controller maintain the same state

• Switches interact with leader• Change doesn’t happen until whole group agrees• Failure of primary

• N-1 work together to elect a new leader(determine new leader)

Network Events

Propagate State changes

Page 16: SDN Controller Challenges

Pros-Cons of Paxos

• Pros– Well understood and studied; gives good FT– Many implementations in the wild– E.g. Zookeeper

• Cons– Time to recover– Impacts through of the put of the entire system

Page 17: SDN Controller Challenges

Controller Scalability

Page 18: SDN Controller Challenges

What limits a controller’s scalability?

• Number of control messages from switch– Depends on the application logic• E.g. MicroTE/Hedera periodically query all switches for

stats• Reactive controller, evaluated in NoX, requires each

switch to send messages for a new flow– Packet-in (if reactive Apps)– Flow stats, Flow_time-outs

Page 19: SDN Controller Challenges

What limits a controller’s scalability?

• Application processing overhead• The controller runs a bunch of application– Similar to: A server running a set of programs– CPU/Memory constraint limit how the app runs

Page 20: SDN Controller Challenges

What limits a controller’s scalability?

• Distance between controller and the switches

Controller 1

Hedera L3 FW

Page 21: SDN Controller Challenges

How to Scale the Controller.• Obvious: add more controllers.• BUT: how about the applications?– Synchronization/concurrency problems. • Who controls which switch?• Who reacts to which events?

Controller 1

Hedera L3 FW

Controller 2

Hedera L3 FW

Controller N

Hedera L3 FW? ?

Stats + Install OF entries

Page 22: SDN Controller Challenges

Medium Sized Networks• Assumption:

– controller can’t store all forwarding table entries in memory – But can process all events and run all apps

• Each controller– Get same network events+ running same app. same output– But store output for only a fraction and config only a fraction

Controller 1

Hedera L3 FW

Controller 2

Hedera L3 FW

Controller N

Hedera L3 FW

Stats + Install OF entries

Page 23: SDN Controller Challenges

Medium Sized Networks: hyperflow

• Each controller– Push state to each controller– Each controller things it’s the only one in the network

Controller 1Hedera L3 FW

Controller 2Hedera L3 FW

Controller NHedera L3 FW

Stats + Install OF entries

Sub-subscribe ssytem

Page 24: SDN Controller Challenges

Large Sized Networks

• Assumptions– Each controller can’t store all the FIB entries– Each controller can’t run the entire application or

handle events

• Need to partition the application– But how?

Page 25: SDN Controller Challenges

Application partition 1

• Approach 1: each controller runs a specific application– How do your resolve conflicts in FW entries– Apps can conflict in the rules they install

Controller 1

Hedera

Controller 2

L3

Controller N

FW

Page 26: SDN Controller Challenges

Application partition 2

• Approach 2: all controllers run the same application but for a subset of devices– Results in a Distributed Mesh control plane

Controller 1

Hedera L3 FW

Controller 2

Hedera L3 FW

Controller N

Hedera L3 FW

Abstract Network view

Page 27: SDN Controller Challenges

Application Partition 2

• Abstract view exchanged with each other– Abstract view reduces the n/w information used

by each controller

Controller 2

Hedera L3 FW

REAL NETWORK

Controller 2’s View of NETWORK

Abstraction Provided byController 1

Abstraction Provided byController N

Page 28: SDN Controller Challenges

ONIX to the SDN Programmer

• Controllers synchronize through a DB or DHT– So each app needs synchronization code.– How do you deal with concurrency.

• How to synchronize between domains.

• How many domains? Or controllers?

• How many switches in a domain?

Page 29: SDN Controller Challenges

Application partition 3

• Approach 3: divide application into local, and global.– Results in a hierarchical control plane

• Global Controller and Local Controllers– Applications that do not need network-wide state• Can be run locally without communicate with other

controllers

Page 30: SDN Controller Challenges

Are Hierarchical Controllers Feasible

• Examples of local applications:– Link Discovery, Learning switch, local policies

• Examples of local portions of a global algo– Data center Traffic engineering

• Elephant flow detection (hedera)• Predictability detection (MicroTE)

• Local apps/controllers have other benefits– High parallelism– Can be run closer to the devices.

Page 31: SDN Controller Challenges

Kandoo: Hierarchical controllers

Controller 1

Hedera L3 FWController 2

Hedera L3 FW

Controller N

Hedera L3 FW

Global ControllerHedera

• 2 levels of controllers: global and local– Local applications are embarrassingly parallel– Local shields global from network events

Page 32: SDN Controller Challenges

Kandoo: Hierarchical controllers

Controller 1

Hedera L3 FWController 2

Hedera L3 FW

Controller N

Hedera L3 FW

Global ControllerHedera

• Local Controllers: run local apps– Returns abstract view to the global controller– Reduces # events sent to global and reduce size of network

seen by

Page 33: SDN Controller Challenges

Kandoo: Hierarchical controllers

Controller 1

Hedera L3 FWController 2

Hedera L3 FW

Controller N

Hedera L3 FW

Global ControllerHedera

• Global Controllers– Runs global apps: AKA apps that need network

wide state

Page 34: SDN Controller Challenges

Hedera Reminder

• Goal: reduce network contention• Insight: contention happens when elephants

share paths.• Solution:– Detect Elephant flows– Place Elephant flows on different flows

Page 35: SDN Controller Challenges

Implementing Hedera in Onix

Controller 1

Hedera: detection +placement

Controller 2

Hedera: detection+placement

• 2 levels of controllers: global and local– Local applications are embarrassingly parallel– Local shields global from network events

StatsStatsFlow

Table entries

Flow Table entries

Exchange TM+detection

Page 36: SDN Controller Challenges

Implementing Hedera in Kandoo

Controller 1Elephant detection

Controller 2 Controller N

Global ControllerHedera: Global placement

• Local Controllers: get stats from networks + elephant detection• Global Controller: decide flow placement + flow installation

Elephant detection Elephant detection

Inform of elephant flows

Stats

Install new flow table entries

Page 37: SDN Controller Challenges

Implementing B4 in Kandoo like architecture

Site ControllerElephant detection

Site Controller 2 Site Controller N

Global Controller

TE+BW allocator

• Local Controllers: get stats from networks + determines demand• Global Controller: calculate paths for traffic

Elephant detection Elephant detection

Install TE Ops

Stats + Install OF entries

TE DB

Inform of Flow demands

Page 38: SDN Controller Challenges

Kandoo to the SDN Programmer

• Think of what is local and what is global– When apps are written, annotate with local flag

• Kandoo will automatically place local – And place global.

• Kandoo restricts messages between global and local controllers– You can’t send OF styles messages – Must send Kandoo style messages

Page 39: SDN Controller Challenges

Summary

• Centralization provide simplicity at the cost of reliability and scalability

• Replication can improve reliability and scalability• For Reliability, Paxos is an option• For Scalability, conqueror and divide – Partition the applications

• Kandoo: Local apps and global apps– Partition the network

• Onix: each controller controls a subset of switches (Domain)