Networking Architectures for Big-Data Applicationsjain//talks/ftp/bigdata.pdf · Networking...

14
1 ©2011 Raj Jain http://www.cse.wustl.edu/~jain/talks/bigdata.htm Washington University in St. Louis Networking Architectures for Networking Architectures for Big Big - - Data Applications Data Applications Raj Jain Washington University in Saint Louis Saint Louis, MO 63130 [email protected] Eighth Annual Microsoft Research Networking Summit Woodinville, WA, June 18-19, 2012 These slides and audio/video recordings of this talk are at: http://www.cse.wustl.edu/~jain/talks/bigdata.htm

Transcript of Networking Architectures for Big-Data Applicationsjain//talks/ftp/bigdata.pdf · Networking...

Page 1: Networking Architectures for Big-Data Applicationsjain//talks/ftp/bigdata.pdf · Networking Architectures for Big-Data Applications Raj Jain Washington University in Saint Louis Saint

1©2011 Raj Jainhttp://www.cse.wustl.edu/~jain/talks/bigdata.htmWashington University in St. Louis

Networking Architectures for Networking Architectures for BigBig--Data ApplicationsData Applications

Raj Jain Washington University in Saint Louis

Saint Louis, MO [email protected]

Eighth Annual Microsoft Research Networking Summit Woodinville, WA, June 18-19, 2012

These slides and audio/video recordings of this talk are at:http://www.cse.wustl.edu/~jain/talks/bigdata.htm

Page 2: Networking Architectures for Big-Data Applicationsjain//talks/ftp/bigdata.pdf · Networking Architectures for Big-Data Applications Raj Jain Washington University in Saint Louis Saint

2©2011 Raj Jainhttp://www.cse.wustl.edu/~jain/talks/bigdata.htmWashington University in St. Louis

OverviewOverview

1. Big Data: Why Now?

2. Database Solutions and Networking Bottlenecks

3. Intra-Cloud Solutions

4. Multi-Cloud Issue: OpenADN

Page 3: Networking Architectures for Big-Data Applicationsjain//talks/ftp/bigdata.pdf · Networking Architectures for Big-Data Applications Raj Jain Washington University in Saint Louis Saint

3©2011 Raj Jainhttp://www.cse.wustl.edu/~jain/talks/bigdata.htmWashington University in St. Louis

Big Data: Why Now?Big Data: Why Now?

“Big Data” first appeared as a problem in October 1997Big Research Funding in Q2 2012 Big Academic Interest in all fields

Ref: http://whatsthebigdata.com/2012/06/06/a-very-short-history-of-big-data/

Database Analytics Storage NetworkingSearch …

Page 4: Networking Architectures for Big-Data Applicationsjain//talks/ftp/bigdata.pdf · Networking Architectures for Big-Data Applications Raj Jain Washington University in Saint Louis Saint

4©2011 Raj Jainhttp://www.cse.wustl.edu/~jain/talks/bigdata.htmWashington University in St. Louis

Samples of Recent News about Big DataSamples of Recent News about Big Data

http://cloudcomputing.sys-con.com/node/2286075 , May 30, 2012

http://www.accel.com/bigdata

http://betabeat.com/2012/02/big-data-funding-gets-bigger-ia-ventures-raises-105-m-double-its-previous-fund/

+

+

+

NSF $80M, DoD $250M, DOE $25M+http://gigaom.com/cloud/obamas-big-data-plans-lots-of-cash-and-lots-of-open-data/

Page 5: Networking Architectures for Big-Data Applicationsjain//talks/ftp/bigdata.pdf · Networking Architectures for Big-Data Applications Raj Jain Washington University in Saint Louis Saint

5©2011 Raj Jainhttp://www.cse.wustl.edu/~jain/talks/bigdata.htmWashington University in St. Louis

Magnitude of DataMagnitude of Data 2.5 Exa (=1018) bytes of information created per day

= 30k US Library of Congress 9.57 Zetta (=1021) bytes processed by servers in 2008 One Zetta byte traffic on the Internet Solutions:

Database Analytics Storage Networking …

Ref: http://en.wikipedia.org/wiki/Big_datahttp://www.cisco.com/en/US/solutions/collateral/ns341/ns525/ns537/ns705/ns827/white_paper_c11-481360_ns827_Networking_Solutions_White_Paper.html

Page 6: Networking Architectures for Big-Data Applicationsjain//talks/ftp/bigdata.pdf · Networking Architectures for Big-Data Applications Raj Jain Washington University in Saint Louis Saint

6©2011 Raj Jainhttp://www.cse.wustl.edu/~jain/talks/bigdata.htmWashington University in St. Louis

Database SolutionsDatabase Solutions1. Share Nothing Architecture

2. Map-Reduce

3. Apache Hadoop Distributed File System

Data1

Processor1

Datan

Processor2

MapReduce

OutputMap

MapReduce

Input Output

Shuffle

Switch

Name Node

DN+TT

Sec. NNClientJob Tracker

Switch Switch Switch

DN+TT DN+TT DN+TTDN+TT DN+TT DN+TT DN+TT

Rack Rack Rack Rack

DN= Data NodeTT = Task Tracker

Page 7: Networking Architectures for Big-Data Applicationsjain//talks/ftp/bigdata.pdf · Networking Architectures for Big-Data Applications Raj Jain Washington University in Saint Louis Saint

7©2011 Raj Jainhttp://www.cse.wustl.edu/~jain/talks/bigdata.htmWashington University in St. Louis

Networking BottlenecksNetworking Bottlenecks1. Network Attached Storage (NAS):

Opposite of Share-nothing architecture Networking Bottleneck Directly attached storage

2. IP Subnets:

Routing much slower than switching Moving VMs between subnets Addressing issueseep communication in a subnet

3. L2 Fabric: Multi-switch latency Single Rack Traffic

Subnet Subnet Subnet

Page 8: Networking Architectures for Big-Data Applicationsjain//talks/ftp/bigdata.pdf · Networking Architectures for Big-Data Applications Raj Jain Washington University in Saint Louis Saint

8©2011 Raj Jainhttp://www.cse.wustl.edu/~jain/talks/bigdata.htmWashington University in St. Louis

IntraIntra--Cloud SolutionsCloud Solutions1. Flatter L2 Topologies:

2. Virtualization: Ethernet over IP (VXLAN, TRILL)Multiple IP domains look like one L2 domainDoes not solve the latency issue

3. Optimal Placement: E.g., Camdoop uses reducers in the core4. Location Based Naming: As in Tashi from CMU5. Programmable Networks: Virtual topology similar to a single

rack even though a the systems are physically in different racks

Subnet Subnet SubnetVirtual L2

Page 9: Networking Architectures for Big-Data Applicationsjain//talks/ftp/bigdata.pdf · Networking Architectures for Big-Data Applications Raj Jain Washington University in Saint Louis Saint

9©2011 Raj Jainhttp://www.cse.wustl.edu/~jain/talks/bigdata.htmWashington University in St. Louis

Really Big Data: MultiReally Big Data: Multi--Cloud IssueCloud Issue Example:

Siri (Somewhat Intelligent Response Interpreter)Needs to consult global databasesWhere should the name tracker, data tracker, task tracker reside?

Global Internet

Page 10: Networking Architectures for Big-Data Applicationsjain//talks/ftp/bigdata.pdf · Networking Architectures for Big-Data Applications Raj Jain Washington University in Saint Louis Saint

10©2011 Raj Jainhttp://www.cse.wustl.edu/~jain/talks/bigdata.htmWashington University in St. Louis

Our Solution: OpenADNOur Solution: OpenADN Open Application Delivery Networking Platform.

Platform = Clients, Servers, Switches, and Middle-boxesServers = Name nodes, Task trackers, Data nodes, Clients

Allows global Internet to appear as one datacenterMessage-specific sequence of middle boxes and servers

Access ISP Access ISP

ServersA1, B1

Clients Clients

Internet

OpenADNSwitchesRouters

ServersA2

OpenADNmiddle-boxes

Page 11: Networking Architectures for Big-Data Applicationsjain//talks/ftp/bigdata.pdf · Networking Architectures for Big-Data Applications Raj Jain Washington University in Saint Louis Saint

11©2011 Raj Jainhttp://www.cse.wustl.edu/~jain/talks/bigdata.htmWashington University in St. Louis

OpenADN FeaturesOpenADN Features Message-specific sequence of middle-boxes and Servers:

Name nodes, Task trackers, Data nodes, Clients Middle Boxes:

Load balancing, Fault tolerance, Intrusion Detection, …Control plane and data plane MBs

Server mobility, User Mobility ISPs can offer proper routing and switching services without

visibility into dataCSPs (Cloud service providers) have visibility into data

Message-specific policies

Page 12: Networking Architectures for Big-Data Applicationsjain//talks/ftp/bigdata.pdf · Networking Architectures for Big-Data Applications Raj Jain Washington University in Saint Louis Saint

12©2011 Raj Jainhttp://www.cse.wustl.edu/~jain/talks/bigdata.htmWashington University in St. Louis

Key Features of OpenADNKey Features of OpenADN1. Edge devices only.

Core network can be current TCP/IP based, OpenFlow and SDN based

2. Coexistence (Backward compatibility): Old on New. New on Old

3. Incremental Deployment4. Economic Incentive for first adopters5. Resource owners (ISPs) keep complete control

over their resources

Page 13: Networking Architectures for Big-Data Applicationsjain//talks/ftp/bigdata.pdf · Networking Architectures for Big-Data Applications Raj Jain Washington University in Saint Louis Saint

13©2011 Raj Jainhttp://www.cse.wustl.edu/~jain/talks/bigdata.htmWashington University in St. Louis

SummarySummary

1. Big data = Big Funding Big Interest2. Need networking architectures that complement Shared

Nothing Architecture, Map Reduce, Hadoop File Systems3. NAS needs to be redesigned.

Flat topologies are helpful.Virtualization needs to include latency issues

4. Programmable networks are potential solution5. Solving multi-cloud is important for really big global data

sets.

Page 14: Networking Architectures for Big-Data Applicationsjain//talks/ftp/bigdata.pdf · Networking Architectures for Big-Data Applications Raj Jain Washington University in Saint Louis Saint

14©2011 Raj Jainhttp://www.cse.wustl.edu/~jain/talks/bigdata.htmWashington University in St. Louis

Thank You!Thank You!

Comments?Questions?

Comments?Questions?