10GE Hadoop Network Designs - Amazon S310GE Hadoop Cluster – 160 2RU node Starter • 2RU Nodes 2...
Transcript of 10GE Hadoop Network Designs - Amazon S310GE Hadoop Cluster – 160 2RU node Starter • 2RU Nodes 2...
10GE Hadoop Network Designs
March 26, 2012 http://bradhedlund.com/?p=3561
Global Marketing
Hadoop Overview
Data Node & Task Tracker
Data Node & Task Tracker
Data Node & Task Tracker
Data Node & Task Tracker
Data Node & Task Tracker
Data Node & Task Tracker
slaves
masters
Clients
Name Node Job Tracker Secondary
Name Node
Map Reduce HDFS
Distributed Data Analytics Distributed Data Storage
BRAD HEDLUND .com
Global Marketing
Hadoop Cluster
Rack 1
DN + TT
DN + TT
DN + TT
DN + TT
Name Node
Rack 2
DN + TT
DN + TT
DN + TT
DN + TT
Job Tracker
Rack 3
DN + TT
DN + TT
DN + TT
DN + TT
Secondary NN
Rack 4
DN + TT
DN + TT
DN + TT
DN + TT
Client
Rack N
DN + TT
DN + TT
DN + TT
DN + TT
DN + TT
DN + TT
switch switch switch switch switch
switch switch
Core
BRAD HEDLUND .com
Global Marketing
10GE / 40GE
• Perl / Python scripting • Auto Provisioning • 64 x 10GE • 4 x 40GE • 9MB buffer • 700ns latency
• Perl / Python scripting • 32 x 40GE QSFP • 128 x 10GE • 2RU, 800W • 54MB buffer
10/40GE Top of Rack & Interconnect
40GE Cluster Interconnect
S4810
Z9000
BRAD HEDLUND .com
Global Marketing
10GE Hadoop Cluster – 160 2RU node Starter
• 2RU Nodes 2 x 10GE (C2100) • 20 Nodes per rack • 8 racks, 160 nodes • Scales to 64 racks, 1280 nodes
20 nodes Server cabinet 1
20 nodes
L3 L2
• Auto switch provisioning • 2 rack pairs cross cabled • 2.5:1 oversubscription @ ToR • Core connected via Leaf w/ Layer 3
10GE ToR
10GE Cluster Interconnect Line rate, Low Latency
2RU Node
C2100 2x10GE
2RU Node
C2100 2x10GE
2RU Node
C2100 2x10GE
2RU Node
C2100 2x10GE
Spine
Leaf
20 nodes Server cabinet 2
2RU Node
C2100 2x10GE
2RU Node
C2100 2x10GE
2RU Node
C2100 2x10GE
Job Tracker
2x10GE
VLT
160G
ECMP
x8
160G
ECMP
20 nodes Server cabinet 7
2RU Node
C2100 2x10GE
2RU Node
C2100 2x10GE
2RU Node
C2100 2x10GE
Name Node
2x10GE
20 nodes Server cabinet 8
2RU Node
C2100 2x10GE
2RU Node
C2100 2x10GE
2RU Node
C2100 2x10GE
Client
2x10GE
VLT
Core
160G
ECMP
160G
ECMP
2RU Nodes L3 Fabric
BRAD HEDLUND .com
Global Marketing
10GE Hadoop Cluster – Scaled to 1280 nodes
• 2RU Nodes 2 x 10GE (C2100) • 20 Nodes per rack • 64 racks, 1280 nodes • 2.5:1 oversubscription @ ToR
20 nodes Server cabinet 1
20 nodes
L3 L2
10GE ToR
10GE Cluster Interconnect Line rate, Low Latency
Node
C2100 2x10GE
Node
C2100 2x10GE
Node
C2100 2x10GE
Node
C2100 2x10GE
Leaf
20 nodes Server cabinet 2
Node
C2100 2x10GE
Node
C2100 2x10GE
Node
C2100 2x10GE
VLT
x1
20 nodes Server cabinet 63
Node
C2100 2x10GE
Node
C2100 2x10GE
Node
C2100 2x10GE
20 nodes Server cabinet 64
Node
C2100 2x10GE
Node
C2100 2x10GE
Node
C2100 2x10GE
VLT
Core
160G
ECMP
160G
ECMP
160G
ECMP
160G ECMP
Spine 1 Spine 16
• Expand as needed by adding Spine switches • Leaf QSFP+ optics, Spine SFP+ optics • Leaf QSFP optical breakout cables • 150m Leaf to Spine
2RU Nodes L3 Fabric
Name Node
2x10GE
Client
2x10GE
Job Tracker
2x10GE
BRAD HEDLUND .com
Global Marketing
10GE Hadoop Cluster – 160 1RU node Starter
• 1RU Nodes 2 x 10GE (C1100) • 40 Nodes per rack • 4 racks, 160 nodes • 2.5:1 oversubscription @ ToR
40 nodes Server cabinet 1
40 nodes Server cabinet 2
40 nodes Server cabinet 4
VLT VLT VLT L3 L2
10GE ToR
10GE Cluster Interconnect Line rate, Low Latency
160G
ECMP
160G ECMP
160G
ECMP
160G ECMP
160G ECMP
160G ECMP
1RU Node 2x10GE
Spine
Leaf
Core
x8
1RU Node 2x10GE
1RU Node 2x10GE
1RU Node 2x10GE
1RU Node 2x10GE
Name Node 2x10GE
1RU Node 2x10GE
1RU Node 2x10GE
1RU Node 2x10GE
1RU Node 2x10GE
1RU Node 2x10GE
Job Tracker 2x10GE
1RU Node 2x10GE
1RU Node 2x10GE
1RU Node 2x10GE
1RU Node 2x10GE
1RU Node 2x10GE
Client 2x10GE
• Auto switch provisioning • 2 switches per rack w/ VLT • Core connect via Leaf w/ Layer 3 • Scales to 32 racks, 1280 nodes
1RU Nodes L3 Fabric
BRAD HEDLUND .com
Global Marketing
10GE Hadoop Cluster – Scaled to 1280 nodes
40 nodes Server cabinet 1
x1
40 nodes Server cabinet 2
40 nodes Server cabinet 32
VLT VLT VLT L3 L2
• Expand cluster by adding Spine switches • Leaf QSFP+ optics, Spine SFP+ optics • Leaf QSFP optical breakout cables • 150m Leaf to Spine
10GE ToR
10GE Cluster Interconnect Line rate, Low Latency
Spine 1 Spine 16
160G
ECMP
160G
ECMP
160G ECMP
160G
ECMP 160G
ECMP
160G ECMP
Spine
Leaf
1RU Node 2x10GE
1RU Node 2x10GE
1RU Node 2x10GE
1RU Node 2x10GE
1RU Node 2x10GE
1RU Node 2x10GE
1RU Node 2x10GE
1RU Node 2x10GE
1RU Node 2x10GE
1RU Node 2x10GE
1RU Node 2x10GE
1RU Node 2x10GE
1RU Node 2x10GE
1RU Node 2x10GE
1RU Node 2x10GE
• 1RU Nodes 2 x 10GE (C1100) • 40 Nodes per rack • 32 racks, 1280 nodes • 2.5:1 oversubscription @ ToR
1RU Nodes L3 Fabric
Core
Name Node 2x10GE Job Tracker 2x10GE Client 2x10GE
BRAD HEDLUND .com
Global Marketing
10GE Hadoop Cluster – 40G Fabric starter
• 1RU Nodes 2 x 10GE (C1100) • 40 Nodes per rack • 8 racks, 320 nodes • 2.5:1 oversubscription @ ToR
40 nodes Server cabinet 1
40 nodes Server cabinet 2
40 nodes Server cabinet 8
VLT VLT VLT L3 L2
• Auto switch provisioning • Scales to 64 racks, 2560 nodes (10G) • Fewer cables/optics/ports vs. 10G fabric • Leaf 4 x 40G uplink to Spine
10GE ToR
40G Cluster Interconnect 4RU, 1.6KW total
Spine 2 x Z9000
Leaf
160G ECMP
160G
ECMP
160G
ECMP
160G
ECMP 160G
ECMP
160G ECMP
Core
2x40G
1RU Node 2x10GE
1RU Node 2x10GE
1RU Node 2x10GE
1RU Node 2x10GE
1RU Node 2x10GE
1RU Node 2x10GE
1RU Node 2x10GE
1RU Node 2x10GE
1RU Node 2x10GE
1RU Node 2x10GE
1RU Node 2x10GE
1RU Node 2x10GE
1RU Node 2x10GE
1RU Node 2x10GE
1RU Node 2x10GE
1RU Nodes L3 Fabric
Name Node 2x10GE Job Tracker 2x10GE Client 2x10GE
BRAD HEDLUND .com
Global Marketing
10GE Hadoop Cluster – 40G Fabric
• 2RU Nodes 2 x 10GE (C2100) • 20 Nodes per rack • 16 racks, 320 nodes • Scales to 32 racks, 640 nodes
20 nodes Server cabinet 1
20 nodes
L3 L2
• Auto switch provisioning • Fewer cables/optics/ports vs. 10G fabric • 2.5:1 oversubscription @ ToR • Leaf 4 x 40G uplink to Spine
10GE ToR
2RU Node
C2100 2x10GE
2RU Node
C2100 2x10GE
2RU Node
C2100 2x10GE
2RU Node
C2100 2x10GE
Spine
Leaf
20 nodes Server cabinet 2
2RU Node
C2100 2x10GE
2RU Node
C2100 2x10GE
2RU Node
C2100 2x10GE
Job Tracker
2x10GE
VLT
160G
ECMP
160G
ECMP
20 nodes Server cabinet 7
2RU Node
C2100 2x10GE
2RU Node
C2100 2x10GE
2RU Node
C2100 2x10GE
Name Node
2x10GE
20 nodes Server cabinet 8
2RU Node
C2100 2x10GE
2RU Node
C2100 2x10GE
2RU Node
C2100 2x10GE
Client
2x10GE
VLT
Core
160G ECMP
160G
ECMP
2RU Nodes L3 Fabric
BRAD HEDLUND .com
40G Cluster Interconnect 4RU, 1.6KW total
2x40G
Global Marketing
10GE Hadoop Cluster – 40G Fabric expanded
• 1RU Nodes 2 x 10GE (C1100) • 40 Nodes per rack • 16 racks, 640 nodes • 2.5:1 oversubscribed Leaf
40 nodes Server cabinet 1
40G
40 nodes Server cabinet 2
40 nodes Server cabinet 16
VLT VLT VLT L3 L2
• Expand 40G fabric with 4 x Z9000 Spine • Cost effective QSFP+ optics • Fewer cables/optics/ports vs. 10G fabric • Scales to 64 racks, 2560 nodes (10G)
10GE ToR
40G Cluster Interconnect 8RU, 3.2KW total
Spine 4 x Z9000
Leaf
160G
ECMP
160G
ECMP
160G
ECMP
160G ECMP
160G
ECMP
160G
ECMP
Core 1RU Node 2x10GE
1RU Node 2x10GE
1RU Node 2x10GE
1RU Node 2x10GE
1RU Node 2x10GE
1RU Node 2x10GE
1RU Node 2x10GE
1RU Node 2x10GE
1RU Node 2x10GE
1RU Node 2x10GE
1RU Node 2x10GE
1RU Node 2x10GE
1RU Node 2x10GE
1RU Node 2x10GE
1RU Node 2x10GE
1RU Nodes L3 Fabric
Name Node 2x10GE Job Tracker 2x10GE Client 2x10GE
BRAD HEDLUND .com
Global Marketing
10GE Hadoop Cluster – 40G Fabric expanded
• 2RU Nodes 2 x 10GE (C2100) • 20 Nodes per rack • 32 racks, 640 nodes • Leaf 4 x 40G uplink to Spine
20 nodes Server cabinet 1
20 nodes
L3 L2
• Auto switch provisioning • Fewer cables/optics/ports vs. 10G fabric • 2.5:1 oversubscription @ ToR • Core connected via Leaf w/ Layer 3
10GE ToR
2RU Node
C2100 2x10GE
2RU Node
C2100 2x10GE
2RU Node
C2100 2x10GE
2RU Node
C2100 2x10GE
Leaf
20 nodes Server cabinet 2
2RU Node
C2100 2x10GE
2RU Node
C2100 2x10GE
2RU Node
C2100 2x10GE
Job Tracker
2x10GE
VLT
20 nodes Server cabinet 7
2RU Node
C2100 2x10GE
2RU Node
C2100 2x10GE
2RU Node
C2100 2x10GE
Name Node
2x10GE
20 nodes Server cabinet 8
2RU Node
C2100 2x10GE
2RU Node
C2100 2x10GE
2RU Node
C2100 2x10GE
Client
2x10GE
VLT
Core
2RU Nodes L3 Fabric
BRAD HEDLUND .com
40G Cluster Interconnect 8RU, 3.2KW total
Spine 4 x Z9000
160G
ECMP
160G
ECMP
160G
ECMP
160G
ECMP
40G
Global Marketing
10GE Hadoop Cluster – 2560 Nodes
• 1RU Nodes 2 x 10GE (C1100) • 40 Nodes per rack • 64 racks, 2560 nodes • 2.5:1 oversubscribed Leaf
40 nodes Server cabinet 1
10G
40 nodes Server cabinet 2
40 nodes Server cabinet 64
VLT VLT VLT L3 L2
• Scale to 16 x Z9000 Spine switches (10G) • Cost effective QSFP+ SR optics • QSFP optical breakout cables (150m) • Each Z9000 configured as 128 x 10G
10GE ToR
10G Cluster Interconnect 32RU, 12.8KW total
Spine 16 x Z9000
Leaf
160G
ECMP
160G
ECMP
160G
ECMP
160G
ECMP
160G ECMP
160G
ECMP
Core
Spine 1 Spine 16
1RU Node 2x10GE
1RU Node 2x10GE
1RU Node 2x10GE
1RU Node 2x10GE
1RU Node 2x10GE
1RU Node 2x10GE
1RU Node 2x10GE
1RU Node 2x10GE
1RU Node 2x10GE
1RU Node 2x10GE
1RU Node 2x10GE
1RU Node 2x10GE
1RU Node 2x10GE
1RU Node 2x10GE
1RU Node 2x10GE
1RU Nodes L3 Fabric
Name Node 2x10GE Job Tracker 2x10GE Client 2x10GE
BRAD HEDLUND .com