SAN DIEGO SUPERCOMPUTER CENTER SDSC's Data Oasis Balanced performance and cost-effective Lustre file...

10
SAN DIEGO SUPERCOMPUTER CENTER SDSC's Data Oasis Balanced performance and cost-effective Lustre file systems. Lustre User Group 2013 (LUG13) Rick Wagner San Diego Supercomputer Center Jeff Johnson Aeon Computing April 18, 2013

Transcript of SAN DIEGO SUPERCOMPUTER CENTER SDSC's Data Oasis Balanced performance and cost-effective Lustre file...

Page 1: SAN DIEGO SUPERCOMPUTER CENTER SDSC's Data Oasis Balanced performance and cost-effective Lustre file systems. Lustre User Group 2013 (LUG13) Rick Wagner.

SAN DIEGO SUPERCOMPUTER CENTER

SDSC's Data Oasis

Balanced performance and cost-effective Lustre file systems.

Lustre User Group 2013(LUG13)

Rick WagnerSan Diego Supercomputer Center

Jeff JohnsonAeon Computing

April 18, 2013

Page 2: SAN DIEGO SUPERCOMPUTER CENTER SDSC's Data Oasis Balanced performance and cost-effective Lustre file systems. Lustre User Group 2013 (LUG13) Rick Wagner.

SAN DIEGO SUPERCOMPUTER CENTER

Data Oasis

• High performance, high capacity Lustre-based parallel file system

• 10GbE I/O backbone for all of SDSC’s HPC systems, supporting multiple architectures

• Integrated by Aeon Computing using their EclipseSL• Scalable, open platform design• Driven by 100GB/s bandwidth target for Gordon• Motivated by $/TB and $/GB/s

• $1.5M = 4@MDS + 64@OSS = 4PB = 100GB/s

• 6.4PB capacity and growing• Currently Lustre 1.8.7

Page 3: SAN DIEGO SUPERCOMPUTER CENTER SDSC's Data Oasis Balanced performance and cost-effective Lustre file systems. Lustre User Group 2013 (LUG13) Rick Wagner.

SAN DIEGO SUPERCOMPUTER CENTER

Data Oasis Heterogeneous Architecture

OSS72TB

64 OSS (Object Storage Servers)

Provide 100GB/s Performance and >4PB Raw Capacity

JBODs (Just a Bunch Of Disks)

Provide Capacity Scale-out

Arista 750810G

Arista 750810G

Redundant Switches for Reliability and

Performance

3 Distinct Network Architectures

OSS72TB

OSS72TB

OSS108TB

JBOD132TB

64 Lustre LNET Routers100 GB/s

Mellanox 5020 Bridge12 GB/s

MDS: Gordon scratch

MDS: Trestles scratch

Juniper 10G SwitchesXX GB/s

MDS: Triton scratch

GORDONIB cluster

TRITON10G & IB cluster

TRESTLES IB cluster

Metadata Servers

MDS: Gordon & Trestles project

Page 4: SAN DIEGO SUPERCOMPUTER CENTER SDSC's Data Oasis Balanced performance and cost-effective Lustre file systems. Lustre User Group 2013 (LUG13) Rick Wagner.

SAN DIEGO SUPERCOMPUTER CENTER

File Systems

File System Clusters OSSes JBODs Capacity (RAW)

Monkey Gordon 32 0 2.3PB

Meerkat Gordon & Trestles

8 8 1.9PB

Puma Trestles 8 0 576TB

Dolphin Triton 16 0 1.2PB

Rhino Development 4 4 480TB

Page 5: SAN DIEGO SUPERCOMPUTER CENTER SDSC's Data Oasis Balanced performance and cost-effective Lustre file systems. Lustre User Group 2013 (LUG13) Rick Wagner.

SAN DIEGO SUPERCOMPUTER CENTER

Data Oasis Servers

MDS (active) OSS OSS+JBOD

MDS (backup)

LSI

LSI

RAID 10 (2x6)

Myri10GbE

LSI

RAID 6 (7+2)

RAID 6 (7+2)

Myri10GbE

Myri10GbE

LSI

RAID 6 (8+2)

RAID 6 (8+2)

Myri10GbE

x4x4 … …

3TB drives2TB drives

Page 6: SAN DIEGO SUPERCOMPUTER CENTER SDSC's Data Oasis Balanced performance and cost-effective Lustre file systems. Lustre User Group 2013 (LUG13) Rick Wagner.

SAN DIEGO SUPERCOMPUTER CENTER

Trestles Architecture

QDR 40 Gb/sGbE 10GbE

EthernetManagement

QDR InfiniBandSwitch

NFS Servers (4x)Gordon & Trestles

Shared

ComputeNode

ComputeNode

ComputeNode

Data Movers(4x)

Data OasisLustre PFS

4 PB

XSEDE & R&E Networks

LoginNodes (2x)

ComputeNode

324

• QDR IB• GbE management• GbE public• Round robin login• Mirrored NFS• Redundant front-end

Arista (2x MLAG)7508 10 GbE switch

IB/EthernetBridge switch

Mgmt.Nodes (2x)

GordonCluster

4x

SDSC Network

12x

Page 7: SAN DIEGO SUPERCOMPUTER CENTER SDSC's Data Oasis Balanced performance and cost-effective Lustre file systems. Lustre User Group 2013 (LUG13) Rick Wagner.

SAN DIEGO SUPERCOMPUTER CENTER

Gordon Network Architecture

QDR 40 Gb/sGbE 2x10GbE 10GbE

3D torus: rail 1 3D torus: rail 2

Mgmt.Nodes (2x)

Mgmt. Edge & Core Ethernet

Public Edge & Core Ethernet

NFSServer (4x)

ComputeNode

ComputeNode

ComputeNode

Data Movers(4x)

Data OasisLustre PFS

4 PB

XSEDE & R&E Networks

SDSC Network

IO Nodes

IO Nodes

LoginNodes (4x)

ComputeNode

1,024

64

• Dual-rail IB• Dual 10GbE storage• GbE management• GbE public• Round robin login• Mirrored NFS• Redundant front-end

Arista10 GbE switch

4x

128x

Page 8: SAN DIEGO SUPERCOMPUTER CENTER SDSC's Data Oasis Balanced performance and cost-effective Lustre file systems. Lustre User Group 2013 (LUG13) Rick Wagner.

SAN DIEGO SUPERCOMPUTER CENTER

Gordon Network Design DetailSTATUS

PSU 1

PSU 2

FAN

RST

3433

3231

3635

2827

2625

3029

2221

2019

2423

1615

1413

1817

109

87

1211

43

21

65

IS 5030CONSOLEMGT

0

1

00

1

0 0

1

00

1

0 0

1

00

1

0 0

1

00

1

0

0

1

00

1

0 0

1

00

1

0 0

1

00

1

0 0

1

00

1

0

STATUS

PSU 1

PSU 2

FAN

RST

3433

3231

3635

2827

2625

3029

2221

2019

2423

1615

1413

1817

109

87

1211

43

21

65

IS 5030CONSOLEMGT

Rail 0

Rail 1

0

1

00

1

0 0

1

00

1

0 0

1

00

1

0 0

1

00

1

0

0

1

00

1

0 0

1

00

1

0 0

1

00

1

0 0

1

00

1

0

STATUS

PSU 1

PSU 2

FAN

RST

3433

3231

3635

2827

2625

3029

2221

2019

2423

1615

1413

1817

109

87

1211

43

21

65

IS 5030CONSOLEMGT

STATUS

PSU 1

PSU 2

FAN

RST

3433

3231

3635

2827

2625

3029

2221

2019

2423

1615

1413

1817

109

87

1211

43

21

65

IS 5030CONSOLEMGT

16 Compute Nodes

16 Compute NodesFlash I/O Node

Flash I/O Node

Each switch connected to its 6neighbors via 3 QDR links

LustreFilesystem

Dual 10GbE

Dual 10GbE

Page 9: SAN DIEGO SUPERCOMPUTER CENTER SDSC's Data Oasis Balanced performance and cost-effective Lustre file systems. Lustre User Group 2013 (LUG13) Rick Wagner.

SAN DIEGO SUPERCOMPUTER CENTER

Data Oasis Performance – Measured from Gordon

Page 10: SAN DIEGO SUPERCOMPUTER CENTER SDSC's Data Oasis Balanced performance and cost-effective Lustre file systems. Lustre User Group 2013 (LUG13) Rick Wagner.

SAN DIEGO SUPERCOMPUTER CENTER

Issues & The Future

• LNET “death spiral”• LNET tcp peers stop communicating, packets back up

• We need to upgrade to Lustre 2.x soon• Can’t wait for MDS SMP improvements & DNE

• Design drawback: juggling data is a pain• Client virtualization testing

• SR-IOV very promising for o2ib clients

• Watching the Fast Forward program• Gordon’s architecture ideally suited to burst buffers

• HSM• Really want to tie Data Oasis to SDSC Cloud