A Look at the Network: Searching for Truth in Distributed Applications

Post on 07-Dec-2014

1.765 views 0 download

description

A talk by C. Scott Andreas (@cscotta) of Boundary on "the network" and designing / deploying distributed applications.This session offers a deep-dive into how application-level problems manifest at the network level. Some of these cases range from basic network partitions and node outages to sophisticated application-level changes such as garbage collections on managed runtimes, classes of bugs which evade conventional monitoring but constitute partial failures, changes in network activity based on database partitioning, load balancing, and sharding, and other warning signs that crop up at layer three long before wreaking havoc at layer seven as customer-visible failures begin to occur. Combining application-level metrics with network analytics is a powerful cocktail for identifying hot spots quickly, and connecting the dots out to the client closes the loop.

Transcript of A Look at the Network: Searching for Truth in Distributed Applications

taco.cat/oscon12

A Look at the Network

Searching for Truth in Distributed Applications

c. scott andreas (cscotta)oscon 2012 - portland oregon

taco.cat/oscon12

taco.cat/oscon12

taco.cat/oscon12

taco.cat/oscon12

THE NETWORKIS RELIABLE

taco.cat/oscon12

THE NETWORK IS SECURETHERE IS ONE ADMINISTRATOR

taco.cat/oscon12

LATENCY IS ZEROBANDWIDTH IS INFINITE

taco.cat/oscon12

THE NETWORK IS HOMOGENOUS

TOPOLOGY DOESN’T CHANGE

taco.cat/oscon12

TRANSPORTCOST IS ZERO

taco.cat/oscon12

where can i buy that?

taco.cat/oscon12

[ another approach ]

taco.cat/oscon12

taco.cat/oscon12

taco.cat/oscon12

taco.cat/oscon12

taco.cat/oscon12

NETWORK + YOUR APPSARE A GRAPH

taco.cat/oscon12

taco.cat/oscon12

YEAH BUT...WHAT DO WE DO WITH IT?

taco.cat/oscon12

graphs change constantlytheir edges can be

represented as a time series

taco.cat/oscon12

GRAPH TIME SERIES

+

taco.cat/oscon12

LEMMA 1:there exists no possible way for

applications to communicate except via the network

taco.cat/oscon12

LEMMA 2:applications are unable to fulfill their purpose without communic-

ating and participating in a cluster

taco.cat/oscon12

LEMMA 3:the network can be represented

as a time series and a graph

taco.cat/oscon12

LEMMA 4:nearly all modes of failure in distributed systems can be

identified and predicated this way

taco.cat/oscon12

WHAT CAN ONE OBSERVE?

network partitions

impaired nodesGC pauses

poor load balancing

good deploys

bad deploys

security breachesvariance

taco.cat/oscon12

taco.cat/oscon12

taco.cat/oscon12

taco.cat/oscon12

taco.cat/oscon12

taco.cat/oscon12

TOOLS

– NProbe / NTop

– CFlowd / flow-tools

– TCPDump / TCPReplay

– CollectD

– R / RStudio

– Esper

– Dynamic Time Warping (algo)

– Python / NumPy / SciPy

taco.cat/oscon12

FURTHER READING– Network Flow Analysis (No-Starch Press)

– Eamonn Keogh (“Atomic Wedgie”)

– K-Snap (“Efficient Aggr. for Graph Summ.”)

– “Medians and Beyond” (Shrivastava)

– Exponential Smoothing:The State Space Approach (Hyndman)

– Gigascope (AT&T Research)

– Dynamic Time Warping (algo)

– Studying Complex Adaptive Systems (Holland)

– HyperLogLog / Count-Min Sketch

taco.cat/oscon12

BONUS!a spec for thai chili salsa and reference implementation

– 28oz peeled tomatoes

– Half a bag of hot thai chilis

– Half a cucumber

– Handful of radishes

– One green bell pepper

– A small red onion

– Half a thing of cilantro

– 6 cloves of garlic

– 2 tablespoons of salt

– 2 tablespoons of white vinegar

– Juice from half a lime

– Bit of parsley

– Some lemongrass

taco.cat/oscon12

BONUS!a spec for thai chili salsa and reference implementation

– 28oz peeled tomatoes

– Half a bag of hot thai chilis

– Half a cucumber

– Handful of radishes

– One green bell pepper

– A small red onion

– Half a thing of cilantro

– 6 cloves of garlic

– 2 tablespoons of salt

– 2 tablespoons of white vinegar

– Juice from half a lime

– Bit of parsley

– Some lemongrassMIX IT ALL UP

taco.cat/oscon12

A Look at the Network

Searching for Truth in Distributed Applications

oscon 2012 - portland oregon