1/36. 2/36 Towards Sustained Scalability of Communication Networks Mike P. Wittie...

22
1/36

Transcript of 1/36. 2/36 Towards Sustained Scalability of Communication Networks Mike P. Wittie...

1/36

2/36

Towards Sustained Scalability of Communication Networks

Mike P. [email protected]

3/36

Changing Traffic Patterns

High bandwidth cost of inter-user traffic

High delay of local communications

4/36

Diverse Traffic RequirementsInter-user communications Real-time traffic Autonomous device traffic

5/36

Challenges to Sustained Net. Scalability

• Are existing network architectures well-suited to meet evolving traffic requirements?

• Throwing capacity at the problem is not cost efficient or effective– Cannot reduce end-to-end

delay– Existing capacity not used

effectively

?

6/36

Dissertation Research Overview

• Network engineering process

• Clean slate ideas

• Dirty slate solutions

Network Measurement

Topology Design

Resource Allocation

MIST [Broadnets ’07]

AirLab [CCR ’11]

Cloud Routing [Comcom ‘09]SISR [SECON ’09]Distr. OSNs [CoNEXT ‘10]

AggrBP [in preparation]

ParaNets [HotMobile ‘07]

7/36

Online Social Networks (OSNs)

• OSNs are fun, socially significant, and here to stay

• OSN traffic is the next evolution in Internet usage

• How is OSN traffic different?

• Can we deliver OSN traffic more effectively?Mike P. Wittie, Veljko Pejovic, Lara Deek, Kevin C. Almeroth, Ben Y. Zhao "Exploiting Locality of Interest in Online Social Networks," in ACM CoNEXT, 2010.

8/36

Project Overview

• Infrastructure discovery– DNS redirection

• Interaction analysis– Crawls of public profiles

• Protocol discovery– Packet traces

• Path measurement– PlanetLab + ping

U.S.-centralized infrastructure

Local communication patterns

Inefficient routing

High latency and loss paths

High network costs

Unresponsive service

9/36

Facebook’s Infrastructure

10/36

Social Graph and Interactions

• Locality of interest:– Percentage of user

communications within a region

• Web crawls of public user profiles

• Social graph

• Interaction history– Local-to-Local– Remote-to-Local– Local-to-Remote

Russia0.26mil

Egypt 3.86mil

Sweden 8.59mil

NYC 2.95mil

LA 2.25mil

Local Remote

Users and Social Links

11/36

Locality of Interest: Posts

Many posts between local users

Local to Local Local to Remote Remote to Local

12/36

Delivery Ratio of Wall Posts

Needlessly routed through the U.S.

Local to Local Local to Remote Remote to Local

13/36

Internet Path Measurement

• High latency and loss but separable by a regional proxy

Region

Latency (ms) Loss (%) Capacity (Mbps)

Direct

To Region

LastMile

Direct

To Region

LastMile

Direct

To Region

LastMile

Russia 148 115 31 6.1 0 1.8 29.6 367 29.6

Egypt 164 176 67 5.8 0 5.8 0.92 736 0.92

Sweden 104 95 14 0.32 0 2.9 9.47 188 9.47

NYC 74 43 33 0.75 0 0.6 9.51 99 9.51

LA 27 9.1 18 0.5 0 0.4 2.02 228 2.02Facebook Akamai UserFacebook Akamai User Regional Server

14/36

Discussion

Challenges

• Infrastructure centralization results in inefficient routes

• Many round trips and poor paths inflate request delay

Solutions

• Locality of interest allows OSN state partitioning:

REGIONAL OSN CACHES

• Split the high latency and loss path segments:

REGIONAL TCP PROXIES

15/36

Current Facebook Architecture

1. New content

2. Display markup

3. Static contentPost

Facebook Akamai User

1. New content

2. Display markup

3. Static contentPost

1, 2, 3

Facebook Akamai User Regional Server

With TCP Proxies

16/36

Current Facebook Architecture

1. Any new content?

2. Concatenated display markup

3. Static contentPoll

Facebook Akamai User

With Regional Caches

3. Static contentPoll

Facebook Akamai User Regional Server

1, 2, 3

1. Any new content?

2. Concatenated display markup

17/36

Replay Posts

Add Push

Process Poll

Simulate TCP

Simulating Facebook Traffic

Interaction History

Transaction Traces

Social Graph

Inputs Simulation Core Metrics

Path Measurement

Network Load

Transaction Delay

Cache Usage

18/36

Delay of Post Transactions

TCP proxies significantlyreduce transaction delay

19/36

Network Load

Regional caches significantlyreduce load on Internet paths

20/36

Discussion

• Lack of sophisticated topology design has significant impact on Facebook performance:– Inefficient centralized processing of regional communications– Unresponsive service due to high latency and loss on Internet paths and

many round trips

• Regional TCP proxies and caches can improve OSN responsiveness

• Distributed infrastructure is a more effective scaling strategy– Infrastructure design that can conform to communication patterns– Multi-cloud service deployments and optimization

21/36

Summary and Conclusions

• Sustained network scalability challenged by changing traffic patterns

• More sophisticated network design and resource management methods can address fundamental limitations of existing deployments

• Networks scalability an important problem:– Allow the Internet to continue to play important role in our lives– Support new types of applications– Maintain democratic access to digital communications

• Clean slate ideas and dirty slate solutions for a new set of problems

22/36

Vision for Future Work

• Clean slate ideas and dirty slate solutions a fruitful and valid research approach

• Networks scalability an important problem:– Allow the Internet to play important role in our lives– Support new types of applications– Maintain democratic access to digital communications

• Apply my research philosophy to a new set of problems:– Large distributed applications in a diverse cloud services ecosystem– Autonomous device coordination and services