A Converged HPC Big Data System for Nontraditional and HPC ... · 2 Bridges, a new kind of...

33
Nick Nystrom · Sr. Director of Research · [email protected] Overview · March 15, 2017 A Converged HPC & Big Data System for Nontraditional and HPC Research © 2017 Pittsburgh Supercomputing Center

Transcript of A Converged HPC Big Data System for Nontraditional and HPC ... · 2 Bridges, a new kind of...

Page 1: A Converged HPC Big Data System for Nontraditional and HPC ... · 2 Bridges, a new kind of supercomputer at PSC, is designed to converge HPC and Big Data, empower new research communities,

Nick Nystrom · Sr. Director of Research · [email protected] · March 15, 2017

A Converged HPC & Big Data Systemfor Nontraditional and HPC Research

© 2017 Pittsburgh Supercomputing Center

Page 2: A Converged HPC Big Data System for Nontraditional and HPC ... · 2 Bridges, a new kind of supercomputer at PSC, is designed to converge HPC and Big Data, empower new research communities,

2

Bridges, a new kind of supercomputer at PSC, is designed to converge HPC and Big Data, empower new research communities, bring desktop convenience to high-end computing, expand remote access, and help researchers to work more intuitively.• Funded by a $17.2M NSF award, Bridges emphasizes usability, flexibility, and interactivity• Available at no charge for open research and course support and by arrangement to industry• Popular programming languages and applications: R, MATLAB, Python, Hadoop, Spark, …• 846 compute nodes containing 128GB (800), 3TB (42), and 12TB (4) of RAM each; 96 GPUs• Dedicated nodes for persistent databases, web servers, and distributed services• The first deployment of the Intel Omni-Path Architecture fabric

Page 3: A Converged HPC Big Data System for Nontraditional and HPC ... · 2 Bridges, a new kind of supercomputer at PSC, is designed to converge HPC and Big Data, empower new research communities,

3

delivered Bridges

All trademarks, service marks, trade names, trade dress, product names, and logos appearing herein are the property of their respective owners.

Acquisition and operation of Bridges are made possible by the National Science Foundation through award #ACI-1445606 ($17.2M):

Bridges: From Communities and Data to Workflows and Insight

Page 4: A Converged HPC Big Data System for Nontraditional and HPC ... · 2 Bridges, a new kind of supercomputer at PSC, is designed to converge HPC and Big Data, empower new research communities,

4

Accessing BridgesFor open research:

Allocations on Bridges are available at no charge through XSEDE, https://www.xsede.org/allocations.

For industrial affiliates, Pennsylvania-based researchers, and others to foster discovery and innovation and broaden participation in data-intensive computing:

Up to 10% of Bridges’ capacity is available on a discretionary basis.Contact Nick Nystrom, Bridges PI, at [email protected].

Page 5: A Converged HPC Big Data System for Nontraditional and HPC ... · 2 Bridges, a new kind of supercomputer at PSC, is designed to converge HPC and Big Data, empower new research communities,

5

Motivating Use Cases

Data-intensive applications & workflowsGateways – the power of HPC without the programmingShared data collections & related analysis toolsCross-domain analyticsDeep learningGraph analytics, machine learning, genome sequence assembly, and other large-memory applications Scaling beyond the laptopScaling research to teams and collaborationsIn-memory databasesOptimization & parameter sweepsDistributed & service-oriented architecturesData assimilation from large instruments and Internet dataLeveraging an extensive collection of interoperating software

Research areas that haven’t used HPC

New approaches to “traditional HPC” fields (e.g., machine learning)

Coupling applications in novel ways

Leveraging large memory, GPUs, and high bandwidth

}

Page 6: A Converged HPC Big Data System for Nontraditional and HPC ... · 2 Bridges, a new kind of supercomputer at PSC, is designed to converge HPC and Big Data, empower new research communities,

6

OverviewObjectives

– Deliver HPC to new users and research communities.– Bring the power of HPC to Big Data.– Streamline access and provide burst capability to campuses.

Approach– Unify compute and storage on a single, high-performance fabric to unify HPC and Big Data.– Embrace heterogeneity to enable expression of uniquely efficient application frameworks.

– Very large coherent shared memory – 12TB, 3TB, and 128GB of RAM per server – supports productivity, transparent scaling, and important applications in (for example) data analytics, statistics, the life sciences, engineering, and machine learning.

– State-of-the-art GPUs support deep learning and acceleration of a broad range of applications.

– Implement a uniquely flexible environment featuring interactivity, gateways, databases, distributed services, high-productivity programming languages and frameworks, and virtualization and containers.

Page 7: A Converged HPC Big Data System for Nontraditional and HPC ... · 2 Bridges, a new kind of supercomputer at PSC, is designed to converge HPC and Big Data, empower new research communities,

7

Interactivity

• Interactivity is the feature mostfrequently requested bynontraditional HPC communities.

• Interactivity provides immediatefeedback for doing exploratorydata analytics and testing hypotheses.

• Bridges offers interactivity through a combination of shared and dedicated resources to maximize availability while accommodating different needs.

Page 8: A Converged HPC Big Data System for Nontraditional and HPC ... · 2 Bridges, a new kind of supercomputer at PSC, is designed to converge HPC and Big Data, empower new research communities,

8

Gateways and Tools for Building Them

Gateways provide easy-to-use access to Bridges’ HPC and data resources, allowing users to launch jobs, orchestrate complex workflows, and manage data from their browsers.

– Provide “HPC-as-a-Service”– Extensive use of VMs, databases, and distributed services

Galaxy (PSU, Johns Hopkins)https://galaxyproject.org/

The Causal Web (Pitt, CMU)http://www.ccd.pitt.edu/tools/

GenePattern (Broad Institute)

Page 9: A Converged HPC Big Data System for Nontraditional and HPC ... · 2 Bridges, a new kind of supercomputer at PSC, is designed to converge HPC and Big Data, empower new research communities,

9

Virtualization and Containers

• Virtual Machines (VMs) enable flexibility, security, customization, reproducibility, ease of use, and interoperability with other services.

• User demand is for custom database and web server installations to develop data-intensive, distributed applications and containers for custom software stacks and portability.

• Bridges leverages OpenStack to provision resources, between interactive, batch, Hadoop, and VM uses.

Page 10: A Converged HPC Big Data System for Nontraditional and HPC ... · 2 Bridges, a new kind of supercomputer at PSC, is designed to converge HPC and Big Data, empower new research communities,

10

High-Productivity Programming

Supporting languages that communities already use is vital for them to apply HPC to their research questions.

Page 11: A Converged HPC Big Data System for Nontraditional and HPC ... · 2 Bridges, a new kind of supercomputer at PSC, is designed to converge HPC and Big Data, empower new research communities,

11

Bridges’ large memory offers unique benefits to Spark applications.

Bridges enables workflows that integrate Spark/Hadoop, HPC, GPU, and/or large shared-memory components.

Spark, Hadoop & Related Approaches

Page 12: A Converged HPC Big Data System for Nontraditional and HPC ... · 2 Bridges, a new kind of supercomputer at PSC, is designed to converge HPC and Big Data, empower new research communities,

12

Conceptual Architecture

ESM Nodes 12TB RAM

4 nodes

LSM Nodes3TB RAM42 nodes

RSM Nodes128GB RAM800 nodes,

48 with GPUs

Intel Omni-Path Architecture

fabric

Management nodes

Parallel File System

Web Server nodesDatabase

nodesData Transfer

nodesLoginnodes

Users, XSEDE,campuses, instruments

Page 13: A Converged HPC Big Data System for Nontraditional and HPC ... · 2 Bridges, a new kind of supercomputer at PSC, is designed to converge HPC and Big Data, empower new research communities,

13

800 HPE Apollo 2000 (128GB) compute nodes

20 “leaf” Intel® OPA edge switches

6 “core” Intel® OPA edge switches:fully interconnected,2 links per switch

42 HPE ProLiant DL580 (3TB) compute nodes

20 Storage Building Blocks, implementing the parallel Pylonstorage system(10 PB usable)

4 HPE Integrity Superdome X (12TB)

compute nodes …

12 HPE ProLiant DL380 database nodes

6 HPE ProLiant DL360 web server nodes

4 MDS nodes2 front-end nodes

2 boot nodes8 management nodes

Intel® OPA cables

16 RSM nodes, each with 2 NVIDIA Tesla K80 GPUs

32 RSM nodes, each with 2 NVIDIA Tesla P100 GPUs

… each with 2 OPA↔IB gateway nodes

http://staff.psc.edu/nystrom/bvtBridges Virtual Tour:

Purpose-built Intel® Omni-PathArchitecture topology for

data-intensive HPC

Page 14: A Converged HPC Big Data System for Nontraditional and HPC ... · 2 Bridges, a new kind of supercomputer at PSC, is designed to converge HPC and Big Data, empower new research communities,

14

Type RAM Phase n CPU / GPU / other Server

ESM12TBb 1 2 16 × Intel Xeon E7-8880 v3 (18c, 2.3/3.1 GHz, 45MB LLC) HPE Integrity

Superdome X12TBc 2 2 16 × Intel Xeon E7-8880 v4 (22c, 2.2/3.3 GHz, 55MB LLC)

LSM3TBb 1 8 4 × Intel Xeon E7-8860 v3 (16c, 2.2/3.2 GHz, 40 MB LLC)

HPE ProLiant DL5803TBc 2 34 4 × Intel Xeon E7-8870 v4 (20c, 2.1/3.0 GHz, 50 MB LLC)

RSM 128GBb 1 752 2 × Intel Xeon E5-2695 v3 (14c, 2.3/3.3 GHz, 35MB LLC)

HPE Apollo 2000RSM-GPU

128GBb 1 16 2 × Intel Xeon E5-2695 v3 + 2 × NVIDIA Tesla K80

128GBc 2 32 2 × Intel Xeon E5-2683 v4 (16c, 2.1/3.0 GHz, 40MB LLC) + 2 × NVIDIA Tesla P100

DB-s 128GBb 1 6 2 × Intel Xeon E5-2695 v3 + SSD HPE ProLiant DL360

DB-h 128GBb 1 6 2 × Intel Xeon E5-2695 v3 + HDDs HPE ProLiant DL380

Web 128GBb 1 6 2 × Intel Xeon E5-2695 v3 HPE ProLiant DL360

Othera 128GBb 1 16 2 × Intel Xeon E5-2695 v3 HPE ProLiant DL360, HPE ProLiant DL380

Gateway64GBb 1 4 2 × Intel Xeon E5-2683 v3 (14c, 2.0/3.0 GHz, 35MB LLC)

HPE ProLiant DL38064GBc 2 4 2 × Intel Xeon E5-2683 v3

Storage128GBb 1 5 2 × Intel Xeon E5-2680 v3 (12c, 2.5/3.3 GHz, 30 MB LLC)

Supermicro X10DRi256GBc 2 15 2 × Intel Xeon E5-2680 v4 (14c, 2.4/3.3 GHz, 35 MB LLC)

Total 281.75TB 908

a. Other nodes = front end (2) + management/log (8) + boot (4) + MDS (4)b. DDR4-2133c. DDR4-2400

Node Types

Page 15: A Converged HPC Big Data System for Nontraditional and HPC ... · 2 Bridges, a new kind of supercomputer at PSC, is designed to converge HPC and Big Data, empower new research communities,

15

RSM RSM GPU LSM ESM Total

Peak fp64 (Tf/s) 774.9 415.7 109.4 46.0 1,346

RAM (TB) 94 6 126 48 274

Local disk (TB) 6,016 384 672 256 7,328

Parallel file system: usable space (PB) 10

System CapacitiesCompute nodes only, per node type

Page 16: A Converged HPC Big Data System for Nontraditional and HPC ... · 2 Bridges, a new kind of supercomputer at PSC, is designed to converge HPC and Big Data, empower new research communities,

16

Intel® Omni-Path Architecture (OPA)Bridges is the first production deployment of Omni-PathOmni-Path connects all nodes and the shared filesystem, providing Bridges and its users with:

– 100 Gbps line speed per port;25 GB/s bidirectional bandwidth per port

– Measured 0.93μs latency, 12.36 GB/s/dir– 160M MPI messages per second– 48-port edge switch reduces

interconnect complexity and cost– HPC performance, reliability, and QoS– OFA-compliant applications supported without modification– Early access to this new, important, forward-looking technology

Bridges deploys OPA in a two-tier island (leaf-spine) topology developed by PSC for cost-effective, data-intensive HPC

Page 17: A Converged HPC Big Data System for Nontraditional and HPC ... · 2 Bridges, a new kind of supercomputer at PSC, is designed to converge HPC and Big Data, empower new research communities,

17

Data Management

Pylon: A large, central, high-performance storage system– 10 PB usable

– Visible to all compute nodes

– Currently implemented as two complementary file systems:• /pylon1: Lustre, targeted for $SCRATCH use; non-wiped directories available by request

• /pylon2: SLASH2, targeted for large datasets, community repositories, and distributed clients

Distributed (node-local) storage– Enhance application portability

– Improve overall system performance

– Improve performance consistency to the shared filesystem

– Aggregate 7.2 PB (6 PB non-GPU RSM: Hadoop, Spark)

Page 18: A Converged HPC Big Data System for Nontraditional and HPC ... · 2 Bridges, a new kind of supercomputer at PSC, is designed to converge HPC and Big Data, empower new research communities,

18

Database and Web Server Nodes

Dedicated database nodes power persistent relational and NoSQL databases

– Support data management and data-driven workflows– SSDs for high IOPs; HDDs for high capacity

Dedicated web server nodes– Enable distributed, service-oriented architectures– High-bandwidth connections to XSEDE and the Internet

(examples)

Page 19: A Converged HPC Big Data System for Nontraditional and HPC ... · 2 Bridges, a new kind of supercomputer at PSC, is designed to converge HPC and Big Data, empower new research communities,

19

Bridges’ GPUs are accelerating both deep learning and simulation codes

Phase 1: 16 nodes, each with:• 2 × NVIDIA Tesla K80 GPUs (32 total)• 2 × Intel Xeon E5-2695 v3 (14c, 2.3/3.3 GHz)• 128GB DDR4-2133 RAM

Phase 2: +32 nodes, each with:• 2 × NVIDIA Tesla P100 GPUs (64 total)• 2 × Intel Xeon E5-2683 v4 (16c, 2.1/3.0 GHz)• 128GB DDR4-2400 RAM

GPU Nodes

Kepler architecture2496 CUDA cores (128/SM)7.08B transistors on 561mm2 die (28nm)2×24 GB GDDR5; 2×240.6 GB/s562 MHz base – 876 MHz boost2.91 Tf/s (64b), 8.73 Tf/s (32b)

Pascal architecture3584 CUDA cores (64/SM)15.3B transistors on 610mm2 die (16nm)16GB CoWoS® HBM2 at 720 GB/s w/ ECC1126 MHz base – 1303 MHz boost4.7 Tf/s (64b), 9.3 Tf/s (32b), 18.7 Tf/s (16b)Page migration engine improves unified memory64 P100 GPUs → 600 Tf/s (32b)

Page 20: A Converged HPC Big Data System for Nontraditional and HPC ... · 2 Bridges, a new kind of supercomputer at PSC, is designed to converge HPC and Big Data, empower new research communities,

20

Examples of Research thatBridges is Enabing

Page 21: A Converged HPC Big Data System for Nontraditional and HPC ... · 2 Bridges, a new kind of supercomputer at PSC, is designed to converge HPC and Big Data, empower new research communities,

21

The Causal WebCenter for Causal Discovery, an NIH Big Data to Knowledge Center of Excellence

Web node

VMApache Tomcat

Messaging

Database node

VM

Other DBs

LSM Node (3TB)

ESM Node(12TB)

Algorithm execution:FGS continuous and discrete, GFCI, and other algorithms,

building on TETRAD

Browser-based UI

• Authentication• Data mgmt.• Provenance

Execute causal discovery algorithms

Omni-Path

• Prepare and upload data

• Run causal discovery algorithms

• Visualize results

Internet

Memory-resident datasets

Pylon filesystem

TCGAfMRI

http://ccd.pitt.edu/tools/

Page 22: A Converged HPC Big Data System for Nontraditional and HPC ... · 2 Bridges, a new kind of supercomputer at PSC, is designed to converge HPC and Big Data, empower new research communities,

22

GalaxyGalaxy provides a powerful and flexible platform for creating and running reproducible scientific workflows, mostly focused on bioinformatics applications, but also adopted by other domains.– Some specialize, e.g., metagenomics or proteomics– “Galaxy Main” at TACC now routes large-memory Trinity assembly jobs to Bridges– A second, Docker-based Galaxy instance at PSC servers a local user community

http://galaxyproject.org

Page 23: A Converged HPC Big Data System for Nontraditional and HPC ... · 2 Bridges, a new kind of supercomputer at PSC, is designed to converge HPC and Big Data, empower new research communities,

23

Improved SNP Detection in Metagenomic PopulationsWenxuan Zhong, Xin Xing, and Ping Ma, Univ. of Georgia

Assembled 378 Gigabase pairs (Gbp) of gut microbial DNA from normal and diabetic patients– Massive metagenome assembly took only 16 hours using

an MPI-based metagenome assembler, Ray, on 20 BridgesRM nodes connected by Omni-Path

Identified 2,480 species-level clusters in assembledsequence data– Ran MetaGen clustering software across 10 RM nodes,

clustering 500,000 contiguous sequences in only 14 hours– Currently testing new likelihood-based statistical method

for fast and accurate SNP detection from these metagenomicsequencing data

→The team is now using Bridges to test a new statistical method on the sequence data to identify critical differences in gut microbes associated with diabetes.

Environmental Shotgun Sequencing (ESS). (A) Sampling from habitat; (B) filtering particles, typically by size; (C) DNA extraction and lysis; (D) cloning and library; (E) sequence the clones; (F) sequence assembly. By John C. Wooley, Adam Godzik, Iddo Friedberg -http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1000667, CC BY 2.5, https://commons.wikimedia.org/w/index.php?curid=17664682

Page 24: A Converged HPC Big Data System for Nontraditional and HPC ... · 2 Bridges, a new kind of supercomputer at PSC, is designed to converge HPC and Big Data, empower new research communities,

24

Characterizing Diverse Microbial Ecosystemsfrom Terabase-Scale Metagenomic Data

Brian Couger, Oklahoma State University

Assembled 11 metagenomes sampled from diverse sources, comprising over3 trillion bases of sequence data– Including a recent massive assembly of 1.6 Tbp

of metagenomic data from an oil sands tailings pond, a bioremediation target

– Excellent performance of MPI-based Ray assembler on 90 RM nodes completed assembly in only 4.25 days

– Analysis of assembled data in progress to characterize organisms present in these diverse environments and identify new microbial phyla

Oil sands tailings pond.By NASA Earth Observatory.

http://earthobservatory.nasa.gov/IOTD/view.php?id=40997, Public Domain, https://commons.wikimedia.org/w/index.php?curid=8346449

Page 25: A Converged HPC Big Data System for Nontraditional and HPC ... · 2 Bridges, a new kind of supercomputer at PSC, is designed to converge HPC and Big Data, empower new research communities,

25

Creation of World’s Largest k-mer DatabaseRachid Ounit and Chris Mason, Cornell University

Created a database of 153 billion species-specific nucleotide sequences (k-mers)– Analyzed entire NCBI Reference Sequence (RefSeq)

archive containing over 15k species to create the k-merdatabase

– Required a massive in-memory hash table– Computation took 24 days and 4.8 TB of RAM

on a 12 TB node of Bridges– Allows rapid identification and classification of species

in metagenomics samples

By Madprime - Own work, CC BY-SA 3.0, https://commons.wikimedia.org/w/index.php?curid=2121945

Page 26: A Converged HPC Big Data System for Nontraditional and HPC ... · 2 Bridges, a new kind of supercomputer at PSC, is designed to converge HPC and Big Data, empower new research communities,

26

Applying Deep Learning to Connectomics

Goal: Explore the potential of deep learning to automate segmentation of high-resolution scanning electron microscope (SEM) images of brain tissue and the tracing of neurons through 3D volumes to automate generation of the connectome, a comprehensive map of neural connections.

Motivation: This project builds on an ongoing collaboration between PSC, Harvard University, and the Allen Institute for Brain Science, through which we have access to high-quality raw and labeled data. The SEM data volume for mouse cortex imaging is ~3TB/day, and data processing is currently human-intensive. Forthcoming camera systems will increase data bandwidthby a factor of 65.

Datasets: Zebrafish larva (1024×1024×4900), mouse, ….Data volume: mouse brain 430mm3 → ~1.4 exabytes.

Collaborators: Ishtar Nyawĩra, Iris Qian, Annie Zhang, Joel Welling, John Urbanic, Art Wetzel, Nick Nystrom

Images courtesy Florian Engert, David Hildebrand, and their students at the Center for Brain Science, Harvard.

Page 27: A Converged HPC Big Data System for Nontraditional and HPC ... · 2 Bridges, a new kind of supercomputer at PSC, is designed to converge HPC and Big Data, empower new research communities,

27

Some of the Deep Learning Projects Using Bridges

Deep Learning of Game Strategies for RoboCup, Manuela Veloso (CMU)

Automatic Evaluation of Scientific Writing,Diane Litman (U. of Pittsburgh)

Image Classification Applied in Economic Studies, Param Singh (CMU)

Exploring Stability, Cost, and Performance in Adversarial Deep Learning, Matt Fredrikson(CMU)

Enabling Robust Image Understanding Using Deep Learning, Adriana Kovashka (U. of Pittsburgh)

Preparing Grounds to Launch All-US Students Kaggle Competition on Drug Prediction, Gil Alterovitz (Harvard Medical School/Boston Children’s Hospital)

Deep Learning the Gene Regulatory Code, Shaun Mahony (Penn State)

Automatic Building of Speech Recognizers for Non-Experts, Florian Metze (CMU)

Development of a Hybrid Computational Approach for Macroscale Simulation of Exciton Diffusion in Polymer Thin Films, Based on Combined Machine Learning, Quantum-Classical Simulations and Master Equation Techniques, Peter Rossky (Rice U.)

Developing Large-Scale Distributed Deep Learning Methods for Protein Bioinformatics, Junbo Xu (Toyota Technological Institute at Chicago)

Education Allocation for the Course Unstructured Data & Big Data: Acquisition to Analysis, Dokyun Lee (CMU)

Deciphering Cellular Signaling System byDeep Mining a Comprehensive Genomic Compendium, Xinghua Lu (U. of Pittsburgh)

Quantifying California Current PlanktonUsing Machine Learning, Mark Ohman(Scripps Institution of Oceanography)

Petuum, a Distributed System for High-Performance Machine Learning, Eric Xing (CMU)

Automatic Pain Assessment, Michael Reale(SUNY Polytechnic Institute)

Learning to Parse Images and Videos, Deva Ramanan (CMU)

Deep Recurrent Models for Fine-Grained Recognition, Michael Lam (Oregon State)

Page 28: A Converged HPC Big Data System for Nontraditional and HPC ... · 2 Bridges, a new kind of supercomputer at PSC, is designed to converge HPC and Big Data, empower new research communities,

28

Thermal Hydraulics of Next-GenerationGas-Cooled Nuclear Reactors

PI Mark Kimber, Texas A&M University

Completed 3 Large Eddy Simulations of turbulent thermal mixing of helium coolant in the lower plenum of the Gen IV High- Temperature Gas-cooled Reactor (HTGR)– Performed using OpenFOAM, a open-source CFD code – Simulations are for scaled-down versions of representative sections

of the HTGR lower plenum, each containing 7 support posts and 6 high-Reynolds number jets in a cross-flow

– Each simulation involves a high-quality block-structured mesh with 16+ million cells, and was run on 20 regular Bridges nodes (560 cores) for ~10 million time steps (~1 second of simulated time)

– Helps to understand the turbulent thermal mixing in great detail, as well as temperature distributions and hotspots on support posts

Anirban Jana (PSC) is co-PI and computational lead for this DOE project

Distribution of vorticity in a representative “unit cell” of the HTGR lower plenum containing 7 support posts and 6 jets in a crossflow (right to left). Turbulent mixing of core coolant jets in the lower plenum can cause temperature fluctuations and thermal fatigue of the support posts.

Acknowledgements:• DOE-NEUP (Grant nos. DE-NE0008414)• NSF-XSEDE (Grant no. CTS160002)

Page 29: A Converged HPC Big Data System for Nontraditional and HPC ... · 2 Bridges, a new kind of supercomputer at PSC, is designed to converge HPC and Big Data, empower new research communities,

29

Simulations of Nanodisc SystemsJifey Qi and Wonpil Im, Lehigh University

A nanodisc system showing two helical proteins warping around a discoidal lipid bilayer.

Using NAMD on Bridges’ NVIDIA P100 GPU nodes to simulate a number of nanodisc systems to be used as examples for the Nanodisc Builder module in CHARMM-GUI For a system size of 263,888 atoms, the simulation

speed on one P100 node (6.25 ns/day) is faster than on three dual-CPU nodes (4.09 ns/day)

− CHARMM-GUI provides a web-based graphical user interface to generate various molecular simulation systems and input files (for CHARMM, NAMD, GROMACS, AMBER, GENESIS, LAMMPS, Desmond, OpenMM, and CHARMM/OpenMM) to facilitate and standardize the usage of common and advanced simulation techniques

Page 30: A Converged HPC Big Data System for Nontraditional and HPC ... · 2 Bridges, a new kind of supercomputer at PSC, is designed to converge HPC and Big Data, empower new research communities,

30

Reconstructing Large Historical Social NetworksChristopher Warren, Carnegie Mellon University

Six Degrees of Francis Bacon: Employs statistical graph learning and interactive web interfaces to reconstruct and communicate the social networks of Britain from approximately 1500–1700.

Analyzes the 58,625 biographical entries of the Oxford Dictionary of National Biography (ODNB), a 62M-word corpus produced by 10,000 scholars.

Employs a modified Poisson Graphical Lasso method, confidence estimation, chronological filter and probabilistic techniques for name disambiguation,and expert validation.

The methods are applicable to other historical and contemporary societies and other document collections.

Additional resources:

Using Supercomputers to Illuminate the Renaissance:https://www.xsede.org/using-supercomputers-to-illuminate-the-renaissance.

Warren, C. N., et al. (2016). "Six Degrees of Francis Bacon: A Statistical Method for Reconstructing Large Historical Social Networks." Digital Humaties Quarterly 10(3).

Methods and R code: https://github.com/sdfb/sdfb_network.

Six Degrees of Francis Bacon, http://www.sixdegreesoffrancisbacon.com/

Page 31: A Converged HPC Big Data System for Nontraditional and HPC ... · 2 Bridges, a new kind of supercomputer at PSC, is designed to converge HPC and Big Data, empower new research communities,

31

Investigating Economic Impacts of Images andNatural Language in E-commerce

Dokyun Lee, CMU Tepper School of BusinessSecurity and uncertain quality create challenges for sharing economies– Lee et al. studied the impact of high-quality, verified

photos for Airbnb hosts– 17,000 properties over 4 months– Used Bridges’ GPU nodes and large R jobsDifference-in-Difference (DD) analysis showed that on

average, rooms with verified photos are booked 9% more often

Separating effects of photo verification from photo quality and room reviews indicates that high photo quality resultsin $2,455 of additional yearly earnings

They found asymmetric spillover effects: on the neighborhood level, there appears to be higher overall demand if more rooms have verified photos

Page 32: A Converged HPC Big Data System for Nontraditional and HPC ... · 2 Bridges, a new kind of supercomputer at PSC, is designed to converge HPC and Big Data, empower new research communities,

32

Leveraging Bridges to Support Anton 2

2nd-generation molecular dynamics supercomputer in production at PSC– Performs MD simulations 100× faster than other resources– Allows users to run simulations 4× faster and larger than

the previous 1st-generation Anton system at PSC– Allocations on 1st-generation Anton have resulted in over

140 publications so far

Bridges Supports Anton 2 Users– Anton 2 has specific resources for analysis of large MD

trajectories; simulations require pre-equilibrated MD systems

– The Anton 2 filesystem will be mounted on Bridges with Omni-Path to facilitate in situ analysis of large trajectories

© 2016 D.E. Shaw Research. Used with permission.

© 2016 D.E. Shaw Research. Used with permission.

Page 33: A Converged HPC Big Data System for Nontraditional and HPC ... · 2 Bridges, a new kind of supercomputer at PSC, is designed to converge HPC and Big Data, empower new research communities,

33

Thank You

Questions?