SDSC RP Update TeraGrid Roundtable 01-14-10. Reviewing Dash Unique characteristics: –A...

12
SDSC RP Update TeraGrid Roundtable 01-14-10

Transcript of SDSC RP Update TeraGrid Roundtable 01-14-10. Reviewing Dash Unique characteristics: –A...

Page 1: SDSC RP Update TeraGrid Roundtable 01-14-10. Reviewing Dash Unique characteristics: –A pre-production/evaluation “data-intensive” supercomputer based.

SDSC RP Update

TeraGrid Roundtable

01-14-10

Page 2: SDSC RP Update TeraGrid Roundtable 01-14-10. Reviewing Dash Unique characteristics: –A pre-production/evaluation “data-intensive” supercomputer based.

Reviewing Dash• Unique characteristics:

– A pre-production/evaluation “data-intensive” supercomputer based on SSD flash memory and virtual shared memory

– Nehalem processors• Integrating into TeraGrid:

– Add to TeraGrid Resource Catalog– Target friendly users interested in exploring unique capabilities– Available initially for start-up allocations (March 2010)– As it stabilizes and depending on user interest, evaluate more

routine allocations at TRAC level– Appropriate CTSS kits will be installed– Planned to support TeraGrid wide-area filesystem efforts

(GPFS-WAN, Lustre-WAN)

Page 3: SDSC RP Update TeraGrid Roundtable 01-14-10. Reviewing Dash Unique characteristics: –A pre-production/evaluation “data-intensive” supercomputer based.

Introducing Gordon(SDSC’s Track 2d System)

• Unique characteristics:– A “data-intensive” supercomputer based on SSD flash memory and

virtual shared memory• Emphasizes MEM and IO over FLOPS

– A system designed to accelerate access to massive data bases being generated in all fields of science, engineering, medicine, and social science

– Sandy Bridge processors• Integrating into TeraGrid:

– Will be added to TeraGrid Resource Catalog– Appropriate CTSS kits will be installed– Planned to support TeraGrid wide-area filesystem efforts– Coming summer 2011

Page 4: SDSC RP Update TeraGrid Roundtable 01-14-10. Reviewing Dash Unique characteristics: –A pre-production/evaluation “data-intensive” supercomputer based.

The Memory Hierarchy

Flash SSD, O(TB)1000 cycles

Potential 10x speedup for random I/O to large files and databases

Page 5: SDSC RP Update TeraGrid Roundtable 01-14-10. Reviewing Dash Unique characteristics: –A pre-production/evaluation “data-intensive” supercomputer based.

Gordon Architecture: “Supernode”

• 32 Appro Extreme-X compute nodes–Dual processor Intel Sandy

Bridge•240 GFLOPS•64 GB

• 2 Appro Extreme-X IO nodes– Intel SSD drives

•4 TB ea.•560,000 IOPS

• ScaleMP vSMP virtual shared memory–2 TB RAM aggregate–8 TB SSD aggregate

240 GFComp.Node

64 GBRAM

240 GFComp.Node

64 GBRAM

4 TB SSDI/O Node

vSMP memory virtualization

Page 6: SDSC RP Update TeraGrid Roundtable 01-14-10. Reviewing Dash Unique characteristics: –A pre-production/evaluation “data-intensive” supercomputer based.

Gordon Architecture: Full Machine

• 32 supernodes = 1024 compute nodes

• Dual rail QDR Infiniband network–3D torus (4x4x4)

• 4 PB rotating disk parallel file system–>100 GB/s

SN SN

SN SN

SN SN

SN SN

SN SN

SN SN

SN SN

SN SN

SN SN

SN SN

SN SN

SN SN

SN SN

SN SN

SN SN

SN SN

D D D D D D

Page 7: SDSC RP Update TeraGrid Roundtable 01-14-10. Reviewing Dash Unique characteristics: –A pre-production/evaluation “data-intensive” supercomputer based.

Comparing Dash and Gordon systems

Doubling capacity halves accessibility to any random data on a given media

System Component Dash Gordon

Node Characteristics (# sockets, cores, DRAM)

2 sockets, 8 cores, 48 GB 2 sockets, TBD cores, 64 GB

Compute Nodes (#) 64 1024

Processor Type Nehalem Sandy Bridge

Clock Speed (GHz) 2.4 TBD

Peak Speed (Tflops) 4.9 245

DRAM (TB) 3 64

I/O Nodes (#) 2 64

I/O Controllers per Node 2 with 8 ports 1 with 16 ports

Flash (TB) 2 256

Total Memory: DRAM + flash (TB) 5 320

vSMP Yes Yes

32-node Supernodes 2 32

Interconnect InfiniBand InfiniBand

Disk .5 PB 4.5 PB

Page 8: SDSC RP Update TeraGrid Roundtable 01-14-10. Reviewing Dash Unique characteristics: –A pre-production/evaluation “data-intensive” supercomputer based.

Data mining applicationswill benefit from Gordon

• De novo genome assembly from sequencer reads & analysis of galaxies from cosmological simulations and observations

• Will benefit from large shared memory

• Federations of databases and Interaction network analysis for drug discovery, social science, biology, epidemiology, etc. • Will benefit from low latency

I/O from flash

Page 9: SDSC RP Update TeraGrid Roundtable 01-14-10. Reviewing Dash Unique characteristics: –A pre-production/evaluation “data-intensive” supercomputer based.

Data-intensive predictive sciencewill benefit from Gordon

• Solution of inverse problems in oceanography, atmospheric science, & seismology–Will benefit from a balanced system, especially large RAM per core & fast I/O

• Modestly scalable codes in quantum chemistry & structural engineering–Will benefit from largeshared memory

Page 10: SDSC RP Update TeraGrid Roundtable 01-14-10. Reviewing Dash Unique characteristics: –A pre-production/evaluation “data-intensive” supercomputer based.

We won SC09 Data Challenge with Dash!• With these numbers:• IOR 4KB

–RAMFS 4Million+ IOPS on up to .750 TB of DRAM (1 supernode’s worth)

–88K+ IOPS on up to 1 TB of flash (1 supernode’s worth)–Speed up Palomar Transients database searches 10x to 100x

–Best IOPS per dollar

• Since that time we boosted flash IOPS to 540K hitting our 2011 performance targets

Page 11: SDSC RP Update TeraGrid Roundtable 01-14-10. Reviewing Dash Unique characteristics: –A pre-production/evaluation “data-intensive” supercomputer based.

Deployment Schedule• Summer 2009-Present

– Internal evaluation and testing w/ internal apps – SSD and vSMP• Starting ~Mar 2010

– Dash would be allocated via startup requests by friendly TeraGrid users. • Summer 2010

– Expect to change status to allocable system starting ~October 2010 via TRAC requests – Preference given to applications that target the unique technologies of Dash.

• Oct 2010 - June 2011– Operate Dash as an allocable TeraGrid resource, available thru the normal POPS/TRAC

cycles, with appropriate caveats about preferred applications and friendly-user status. – Help fill the SMP gap created by Altix’s being retired in 2010

• March 2011 – July 2011– Gordon build and acceptance

• July 2011 – June 2014– Operate Gordon as an allocable TeraGrid resource, available thru the normal POPS/TRAC

cycles

Page 12: SDSC RP Update TeraGrid Roundtable 01-14-10. Reviewing Dash Unique characteristics: –A pre-production/evaluation “data-intensive” supercomputer based.

HPSS(R/W)

HPSS(R)

SAMQFS(R/W)

SAMQFSLegacy: (R)Allocated: (R/W)

SAMQFSLegacy: (R)Allocated: (R/W)

SAMQFS(R)

Hardware6 Silos12 PB64 Tape Drives

NoChange

Hardware2 Silos6 PB32 Tape Drives

NoChange

Jul 2009 Mid 2010 Mar 2011 Jun 2013

TBD

Consolidating Archive Systems• SDSC has historically operated two archive systems: HPSS and SAM-QFS• Due to budget constraints, we’re consolidating to one: SAM-QFS• We’re currently migrating HPSS user data to SAM-QFS