Monte carlo and network cmg'14

66
Sources of Traffic Demand Variability and Use of Monte Carlo for Network Capacity Planning Performance and Capacity 2014 by CMG November 05, 2014 Alex Gilgur & Brian Eck Views and opinions expressed in this presentation are views and opinions of its authors. If found to be in contradiction with views and policies of Google, Inc., the latter take precedence. Select images are reproduced with permission from Google, Inc.

Transcript of Monte carlo and network cmg'14

Page 1: Monte carlo and network cmg'14

Sources of Traffic Demand Variability and Use

of Monte Carlo for Network Capacity Planning

Performance and Capacity 2014 by CMG

November 05, 2014

Alex Gilgur & Brian Eck

Views and opinions expressed in this presentation are views and opinions of its authors.

If found to be in contradiction with views and policies of Google, Inc., the latter take precedence.

Select images are reproduced with permission from Google, Inc.

Page 2: Monte carlo and network cmg'14

Moore’s Law in Reverse: Drinking from a firehose?

http://www.kpcb.com/internet-trends

$

Page 3: Monte carlo and network cmg'14

…………...

“Matter and energy had ended and with it, space and time...

“All collected data had come to a final end. Nothing was left to be collected.

“But all collected data had yet to be completely correlated and put together

in all possible relationships.

“A timeless interval was spent in doing that.

“And it came to pass that AC learned how to reverse the direction of entropy.

“But there was now no man to whom AC might give the answer of the last

question.”

Isaac Asimov. “The Last Question”. 1956

What does it cost to own a network?

“... ‘THERE IS AS YET INSUFFICIENT DATA FOR A MEANINGFUL

ANSWER.’”

Page 4: Monte carlo and network cmg'14

What does it cost to own a network?

We don’t have the

time for all this!

Guesstimate!

Page 5: Monte carlo and network cmg'14

What does it cost to own a network?

Ahah! But how sure

are you?

It depends on:

● number of servers

● topology

● policies

● traffic patterns

● network protocols

Page 6: Monte carlo and network cmg'14

What does a network cost?

What is the confidence interval of your “guesstimate” of

Total Cost of Ownership of a network?

Network

Cost

Demand Topology Policies

ConstructionNode & Link

Reliability

The Fishbone Diagram

Hardware &

Software

Page 7: Monte carlo and network cmg'14

Sizing the Network

Network

Cost

Demand Topology Policies

ConstructionNode & Link

Reliability

Hardware &

Software

Network

SIZE

Network

Cost

Network size is where we bring value

Page 8: Monte carlo and network cmg'14

Network

SIZE

TopologyDemand

Node & Link

Reliability

Demand Fishbone

Page 9: Monte carlo and network cmg'14

Demand Fishbone

Demand

UsageQoS

Topology

Destination

Source

Guarantees

Latency

Flow

Page 10: Monte carlo and network cmg'14

Demand Variability● Noise & Gaps in data

● Non-stationarity & Outliers

● Variation by O & D Nodes

o Node A

o Node Z

● Variation by QoS

o latency

o Pr{delivery}

● Variation within QoS

o other factors

● Distribution:

Bursty

Wide Amplitude

Complex Patterns

Congestion Control

Page 11: Monte carlo and network cmg'14

Demand Forecastability: Noise & Gaps

● Noise & Gaps in data

● Non-stationarity & Outliers

● Variation by O & D Nodes

o Node A

o Node Z

● Variation by QoS

o latency

o Pr{delivery}

● Variation within QoS

o other factors

● Distribution:

o “from feast to famine”

o Bursts

o Congestion Control

Page 12: Monte carlo and network cmg'14

Demand Forecastability: Non-Stationarity

● Noise & Gaps in data

● Non-stationarity & Outliers

● Variation by O & D Nodes

o Node A

o Node Z

● Variation by QoS

o latency

o Pr{delivery}

● Variation within QoS

o other factors

● Distribution:

Bursty

Wide Amplitude

Complex Patterns

Congestion Control

Page 13: Monte carlo and network cmg'14

Demand Variability: Non-stationarity

● Noise & Gaps in data

● Non-stationarity & Outliers

● Variation by O & D Nodes

o Node A

o Node Z

● Variation by QoS

o latency

o Pr{delivery}

● Variation within QoS

o other factors

● Distribution:

Bursty

Wide Amplitude

Complex Patterns

Congestion Control

Page 14: Monte carlo and network cmg'14

Demand Variability: QoS VariationSC1

SC2● Noise & Gaps in data

● Non-stationarity & Outliers

● Variation by O & D Nodes

o Node A

o Node Z

● Variation by QoS

o latency

o Pr{delivery}

● Variation within QoS

o other factors

● Distribution:

Bursty

Wide Amplitude

Complex Patterns

Congestion Control

Page 15: Monte carlo and network cmg'14

Demand Variability: Other Factors● Noise & Gaps in data

● Non-stationarity & Outliers

● Variation by O & D Nodes

o Node A

o Node Z

● Variation by QoS

o latency

o Pr{delivery}

● Variation within QoS

o other factors

● Distribution:

Bursty

Wide Amplitude

Complex Patterns

Congestion Control

Page 16: Monte carlo and network cmg'14

Demand Variability: Signal Distribution

● Noise & Gaps in data

● Non-stationarity & Outliers

● Variation by O & D Nodes

o Node A

o Node Z

● Variation by QoS

o latency

o Pr{delivery}

● Variation within QoS

o other factors

● Distribution

Bursty

Wide Amplitude

Complex Patterns

Congestion Control

Page 17: Monte carlo and network cmg'14

Demand Predictability

● Not all forecasting tools were created equal:

○ Non-Gaussian distributions

○ Non-stationarity

○ Congestion Control

“All models are wrong. Some models are useful” - G.E.P. Box

● TSA is not the only way to forecast Demand:

○ Explanatory variables:

■ Timestamp is one of them

■ Power

■ CPU

■ Business Metrics

Forecast

Page 18: Monte carlo and network cmg'14

From Demand to Capacity

Demand QoS

Topology

Capacity

Page 19: Monte carlo and network cmg'14

QoS = what’s important to user

1. QoS = 1 / Latency

2. QoS = “Goodput” = Throughput * Pr{delivery}

1. Low Latency

2. High Probability of:

a. Delivery

b. Accuracy

Page 20: Monte carlo and network cmg'14

Find shortest path from Node 1 to Node 2

Routing for Low Latency: SPF: “Travelling Salesman”

4 = Node 4

2= “Latency of this link = 2 units”

Cost = Latency

QoS = 1/Cost = 1/Latency

Page 21: Monte carlo and network cmg'14

Find shortest path from Node 1 to Node 2 IF Node 4 is down

Cost = Latency

QoS = 1/Cost = 1/Latency

Find shortest path from Node 1 to Node 2

4 = Node 4

2= “Latency of this link = 2 units”

Routing for Low Latency: SPF: “Travelling Salesman”

Page 22: Monte carlo and network cmg'14

Find shortest path from Node 1 to Node 2 IF Node 4 is down ...

… and Link 3-5 is losing packetsCost = Latency

QoS = 1/Cost = 1/Latency

Find shortest path from Node 1 to Node 2

4 = Node 4

2= “Latency of this link = 2 units”

Routing for Low Latency: SPF: “Travelling Salesman”

Page 23: Monte carlo and network cmg'14

QoS = what’s important to user

1. QoS = 1 / Latency

2. QoS = “Goodput” = Throughput * Pr{delivery}

1. Low Latency

2. High Probability of:

a. Delivery

b. Accuracy

Page 24: Monte carlo and network cmg'14

“Travelling Salesman” Non-linear optimization

Routing for “Goodput”: Nonlinear optimization

Page 25: Monte carlo and network cmg'14

“Travelling Salesman” Non-linear optimization

Routing for “Goodput”: Nonlinear optimization

Page 26: Monte carlo and network cmg'14

Non-linear optimization

Routing for “Goodput”: Can it be simplified?

Assume:

● No Queueing

○ No Blocking

Redefine:

Can be pseudo-linearized

Page 27: Monte carlo and network cmg'14

Routing As a Process

SPF

Page 28: Monte carlo and network cmg'14

SPF

Routing As a Process

Draining

Page 29: Monte carlo and network cmg'14

SPF

Routing As a Process

Page 30: Monte carlo and network cmg'14

SPF

Routing As a Process

Draining

Page 31: Monte carlo and network cmg'14

SPF

Routing As a Process

Page 32: Monte carlo and network cmg'14

SPF

Routing As a Process

Draining

Page 33: Monte carlo and network cmg'14

SPF

Routing As a Process

Page 34: Monte carlo and network cmg'14

SPF

Routing As a Process

Draining

Page 35: Monte carlo and network cmg'14

SPF

Routing As a Process

Page 36: Monte carlo and network cmg'14

“Whack-a-Mole!”

Routing is updated all the time via:

● Protocol (e.g., TCP)

● SDN Control

We need to accommodate each Flow’s:

● Primary Paths

● Alternative Paths

Page 37: Monte carlo and network cmg'14

Network Demand & Throughput

Link Throughput

Demand Topology

Node & Link

Reliability

Link Size

Page 38: Monte carlo and network cmg'14

Demandi

Throughputj

Connex Traversal Time

(Latency)

Concurrencyj Capacity

From Demand to Capacity:

Page 39: Monte carlo and network cmg'14

Demandi

Throughputj

Link Traversal

Time (Latency)

Concurrencyj Erl-1 (N, PB) Capacity

QoS

PB

To account for Queueing & StatMux, …

Page 40: Monte carlo and network cmg'14

Demand

Throughput

Concurrency for Flowi

Connex Traversal

Time (Latency)

Capacity

For Long-Haul Networks, it reduced to… LPropagation >> LQueueing

Erl-1 (N, PB)

QoS

PB

Page 41: Monte carlo and network cmg'14

Demand

Throughput

Capacity

Bandwidth Fill Factor

For Long-Haul Network, it reduced to…

Can’t forget the stochastic element

LPropagation >> LQueueing

Latency ~ const

Concurrency = const * Throughput

Page 42: Monte carlo and network cmg'14

We can forecast demandDemand:

● A1 -> Z1 : X11 Gbps

● A1 -> Z2 : X12 Gbps

● A2 -> Z3 : X23 Gbps

Throughput

on each Link

Capacity

for each Link

Page 43: Monte carlo and network cmg'14

We can forecast demandDemand:

● A1 -> Z1 : X11 Gbps

● A1 -> Z2 : X12 Gbps

● A2 -> Z3 : X23 Gbps

Throughput

on each Link

Capacity

for each Link

Throughput is combinatorial

Page 44: Monte carlo and network cmg'14

Demand is NOT DeterministicDemand:

● A1 -> Z1 : X11 Gbps

● A1 -> Z2 : X12 Gbps

● A2 -> Z3 : X23 Gbps

Throughput

on each Link

Neither is Throughput

Page 45: Monte carlo and network cmg'14

Throughput:

L12 = ?

L24 = ?

L43 = ?

L31 = ?

L141 = ?

Demand:

N1_N4: 100 Gbps

N2_N4: 200 Gbps

100 G

100 G

200 G

100 G

200 G

200 G

Throughput:

L12 = 100 G

L21 = 200 G

L24 = 300 G

L14 = 300 G

L41 = 0

L43 = 0

L31 = 0

N1 N2

N3 N4

L31

L43

L24

L12

L141

5

315

25

22

From Deterministic Demand to Throughput

Page 46: Monte carlo and network cmg'14

From Gaussian Demand to Throughput:

Throughput:

L12 = ?

L24 = ?

L43 = ?

L31 = ?

L141 = ?

Demand:

N1_N4: N (100, 10) Gbps

N2_N4: N (200, 15) Gbps

Throughput:

L12 = N (100, 10) G

L21 = N (200, 15) G

L24 = N (300, 18) G

L14 = N (300, 18) G

L41 = 0

L43 = 0

L31 = 0

N1 N2

N3 N4

L31

L43

L24

L12

L141

5

315

25

22

Page 47: Monte carlo and network cmg'14

Throughput:

L12 = ?

L24 = ?

L43 = ?

L31 = ?

L141 = ?

Demand:

N1_N4: G (100, ...) Gbps

N2_N4: G (200, ...) Gbps

N1 N2

N3 N4

L31

L43

L24

L12

L141

5

315

25

22

?

From Generic Random Demand to Throughput:

Page 48: Monte carlo and network cmg'14

Monte-Carlo

Page 49: Monte carlo and network cmg'14

Monte-Carlo

Page 50: Monte carlo and network cmg'14

Monte-Carlo

Page 51: Monte carlo and network cmg'14

Every Demand VALUE is a REALIZATION of a RANGE of possible values

Demand Forecast Replace point

estimates with

probability

distributions

Page 52: Monte carlo and network cmg'14

Link Throughput: Monte-Carlo Forecasting

Replace point estimates

with probability distributions

Slice the timeline

For each timestamp:

For each Flow:

roll the dice N times

For each timestamp:

For each of the N dice rolls:

Throughput =

sum (Flows)

Page 53: Monte carlo and network cmg'14

Monte Carlo works with any Transfer Function

Monte Carlo

Throughput

on each Link

Demand (A-Z)

Capacity

for each Link

Page 54: Monte carlo and network cmg'14

Use Case (a case study)

● Hundreds of links

● Thousands of demand flows forecasted

o 95th percentile

o Unspecified Prediction Intervals

● Establish optimal Inventory Size & Policies

o Account for Demand Predictability

● Estimate demand variability effect on:

o Network Size

o TCO

Forecast

Page 55: Monte carlo and network cmg'14

Approach

Quantify Demand

Distributions (use Biases)

Use Monte-Carlo to forecast

Throughput Distributions

Use Monte-Carlo to compute

Capacity Predictive Intervals

Use Monte-Carlo to optimize

Inventory Size & Policies

Biases = Forecast - Observed

Biases != Residuals

Page 56: Monte carlo and network cmg'14

Quantify Demand Ranges & Prepare MC “Forecasts”

Start

For Each

Time Slice

For Each Flow

Compute:

Bias = Projected - ObservedBuild:

Bias Distribution

Roll the dice

N = 100 times

Apply the rolled-out

numbers to the baseline

forecast for each flow

Save the N Demand

scenarios

Page 57: Monte carlo and network cmg'14

Run the Pseudo-Random Demands through MC

Map1

Map2

MapN

MapN-1

Reduce

F flows *

N forecasts

Map: Compute

Capacities (N)

Reduce: Analyze the N

Capacity Forecasts

L links: Capacity

Prediction Intervals

Capacity Forecasts

for each Link

Page 58: Monte carlo and network cmg'14

What does it cost to own a network?

Page 59: Monte carlo and network cmg'14

● Range forecasting is cool!

● Network Demand varies in many ways

● For WAN, it is OK to use throughput

o still it’s better to use concurrency

● Demand ≠ Throughput

o Demand -> Throughput -> Capacity

● Monte-Carlo is a model

o Therefore it is wrong

o But it is useful

In Conclusion

Page 60: Monte carlo and network cmg'14

Acknowledgements● Google’s NetOps Division

● Google’s NetCap & ODS Teams

● Josep Ferrandiz

● Mike Perka

● Leonid Kats

● C. Steven Gunn

● Matthew Mathis

● Kevin J. Mitchell

● Linda Eck

● Sophia Shtilman

● Leora Gilgur

Page 62: Monte carlo and network cmg'14

Backup Slides

Page 63: Monte carlo and network cmg'14

Biases != Residuals. Why?

How good are forecasts

at predicting demand

N days from “now” ???

Page 64: Monte carlo and network cmg'14

H/W Availability: Fault Trees

Reliability Function:

Failure is a memoryless (Poisson) process

F(C|t) = F ((1 OR 2)|t) = 1- (R(1|t) * R(2|t))

F(D|t) = F ((3 AND 4 AND 5)|t) = F(3|t) * F(4|t) * F(5|t)

F(E|t) = F ((7 AND 8) | t) = F (7|t) * F(8|t)

F(F|t) = F ((6 OR E) | t) = 1 - (1 - F(7|t) * F(8|t)) * R(6|t)

F(B|t) = F ((C OR D OR F)|t) = 1 -

R(1|t) * R(2|t)

* (1-F(3|t) * F(4|t) * F(5|t))

* (1-F(7|t) * F(8|t))

* R(6|t)

⇒R(A|t) = R(1|t) * R(2|t)

* (1-F(3|t) * F(4|t) * F(5|t)) *

(1-F(7|t) * F(8|t)) * R(6|t)

C D F

E

B

There’s got to be a cleaner way!

Page 65: Monte carlo and network cmg'14

Fault Trees and Monte-Carlo

C D F

E

B

clock.start()

for each component:

component.update (time = clock)

clock.set (min (next_update_time))

Component

state = (run, fail)

rule = (AND, OR, NONE)

mtbf

mttr

next_update_time

elements: Component

fail()

run()

update(time)

run():

if rule == NONE:

state = run;

else:

//apply rule to elements

return;

fail():

if rule == NONE:

state = fail;

else:

//apply rule to elements

return;

update (time):if time ≥ next_update_time:

if state == fail:

run();

next_update_time

+=Exp(mtbf);

else:

fail();

next_update_time

+=Exp(mttr);

return;

Page 66: Monte carlo and network cmg'14

Probability distributions

Simplest - Uniform:

Least relevant to anything real

Convenient building block for any distribution

Most standard - Gaussian:

Mathematically the simplest

Does not describe the IT world

Most Relevant - Poisson & Exponential):

Relatively simple mathematically

Accurately describes times between arrivals and service times

for a memoryless process.

F(x) = Pr (X ≤ x) - CDF

f (x) = F’(x) - PDF