Charles Wuischpard, Vice President,...Charles Wuischpard, Vice President, Scalable Data Center...
Transcript of Charles Wuischpard, Vice President,...Charles Wuischpard, Vice President, Scalable Data Center...
Charles Wuischpard, Vice President,
Scalable Data Center Solutions Group
Barry Davis, General Manager,
Accelerated Workload Group
Reducing HPC Complexity and Fueling the Rapid Growth of AI
Legal Disclaimers• Intel technologies features and benefits depend on system configuration and may require enabled hardware, software or service
activation. Performance varies depending on system configuration. No computer system can be absolutely secure. Check with your
system manufacturer or retailer or learn more at [intel.com].
• Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors.
Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations
and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance
tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other
products. For more complete information visit www.intel.com/benchmarks
• All information provided here is subject to change without notice. Contact your Intel representative to obtain the latest Intel product
specifications and roadmaps.
• Results have been estimated or simulated using internal Intel analysis or architecture simulation or modeling, and provided to you for
informational purposes. Any differences in your system hardware, software or configuration may affect your actual performance.
• Intel, the Intel logo, Xeon, Intel Xeon Phi, Intel Optane and 3D XPoint are trademarks or registered trademarks of Intel Corporation or its
subsidiaries in the United States or other countries.
• *Other names and brands may be claimed as the property of others.
• © 2016 Intel Corporation. All rights reserved.
3
Business Innovation
High ROI: $515
average return per $1 of HPC investment1
ArtificialIntelligence
Machine Learning
joins computationallearning theory & HPC
Advancing Science
and our understanding of the Universe
Scientific Discovery
1Source: IDC HPC and ROI Study Update (September 2015) 4
Foster the Community
Drive Artificial Intelligence
OvercomeHPC Barriers
5
Foster the Community
Drive Artificial Intelligence
OvercomeHPC Barriers
6
5
Intel® Optane™ Technology
Intel® Software Tools
Intel® Solutions for Lustre* software
Intel® Silicon Photonics
Intel® Ethernet
Intel® Xeon®Processors
Intel® Xeon Phi™Processors
Intel® FPGAs
Intel® Omni-Path Architecture
3D XPoint™ Technology
Intel SSDs
Intel® HPC Orchestrator
Intel SW Defined
Visualization
Fast, Cost-Effective Data Movement
Industry-Leading Compute Fast, Reliable
Access to Data
Ease of Deployment and Management
*Other names and brands may be claimed as the property of others.
Compute
Fabric
Memory / Storage
Software
Updates Today7
4
Intel® Enterprise Edition for Lustre*
Rapid growth in Lustre in past 4 years
Lustre used in 9 of top 10 and 72 of top 100 supercomputers1
Intel contributing ~65% of code for open source comminity2Extreme Scale
Storage for HPC
*Other names and brands may be claimed as the property of others. 1: Source Top500.org 2 Intel estimate 8
4
Intel® HPC Orchestrator
A comprehensive, modular and customizable system software
platform enabling HPC systems
WWW.OpenHPC.Community
Intel® HPC Orchestrator is based on
HPC Members29
Module Downloads~200K
Website Visits~30K
*Other names and brands may be claimed as the property of others.
• Argonne National Laboratory • Center for Research in Extreme Scale Technologies – Indiana University
• University of Cambridge
Best HPC Software
Product or Technology
OpenHPC
Intel® HPC Orchestrator
OpenHPC 1.1.1 Features
Supported OS SLES 12 SP1; RHEL 7.2; CentOS 7.2
SW Architecture Hierarchical development environment uses modules structure to support multiple MPI, compiler, and OS combinations
Integration Pre integrated system software stack,utilizing community continuous build & test environment
Pre-Tested Functional testing, includes cross-package interaction, and development environment
Intel HPC Orchestrator 1.0 Features
OpenHPC Base All OpenHPC features, plus enhanced documentation.
Advanced Testing
Additional integration reliability testing, automated validation suite, and App Benchmarks: HPCG, miniFE
Intel Validation
Intel system configurations validated, including combinations of Intel Xeon and Xeon Phi processors, and Intel Omni-Path Architecture
Support Level 3 Technical Support across full SW stack integration, diagnosing open source software community bugs , and including Workload Manager support
Validated Updates
Validated updates provided across System SW Stack
Intel SW Intel Parallel Studio XE 2017 Cluster Edition (90-day Eval.), Cluster Checker tool for supportability; Intel Enterprise Edition for Lustre* client included
Other names and brands may be claimed as the property of others.
Intel® HPC OrchestratorPre-Integrated Pre-Tested Pre-Validated Intel-Supported
Integration, validation, and maintenance
Reduce
Validated updates and Intel support
Ensure
Deploying traditional & emerging HPC Workloads
SimplifyNode-specific OS Kernel(s)
Linux Distro Runtime Libraries
Overlay & Pub-sub Networks, Identity
User Space Utilities
SW Dev. Toolchain
Compiler & Programmin
g Model Runtimes
High Perf. Parallel LibrariesScalable
Debugging & Perf. Analysis
Tools
Optimized I/O
Libraries
I/O Services
Data Collection
And System
Monitors
Workload Manager
Resource Mgmnt
Runtimes
DB Schema
Scalable DB
Sys
tem
Ma
na
ge
me
nt
(Co
nfi
, In
ven
tory
)
Pro
vision
ing
Syste
m D
iag
no
stics
Fa
bric M
gm
nt
Operator Interface Applications (not part of initial stack)
ISV Applications
Hardware
Available Through Our Partners
Other names and brands may be claimed as the property of others.
“Intel® HPC Orchestrator pulls in all these various components and makes them act as a single system that makes them easy to use, easy to deploy and easy to manage.”
ONUR CELEBIOGLU
Director of HPC Solutions, Dell
“The modularity of Intel® HPC Orchestrator allows us to pair it with a variety of components to provide high level integration & ease of use and deployment for our customers while saving configuration time.”
NAOKI SHINJO
Director of Next Generation Technical Computing, Fujitsu
“Intel® HPC Orchestrator allows us to configure HPC systems up to 50% faster* than if we had to configure our own system software stack.”
GAUTAM SHAH
President and CEO, Colfax
Intel® HPC Orchestrator Success Stories
Other names and brands may be claimed as the property of others.
*Estimate compared to current identification of which software components to use, versioning and loading to clusters which can take 30+ minutes percomponent, with over 60 components that equates to 30 hours. Deploying Intel® HPC Orchestrator on a head node propagates compute nodes in 2-15 hours.
www.Intel/hpcorchestrator
Thank You to Trial Partners
Intel® HPC Orchestrator
*Other names and brands may be claimed as the property of others.
“…..The Intel® Xeon Phi™ processor is at theforefront of CPU architectures poised to open the door to
Exascale systems…”Didier Juvin, Program Director CEA
New Intel® Xeon Phitm Processor Adoption9 New systems on Top 500 representing ~45 PFLOPs!
“……..These achievements are enabling the LAMMPS user community to overcome barriers in computational modeling, enabling new research with larger simulation sizes and longer
timescales…….”Steve Plimpton, Sandia National Laboratories
“……The Intel® Xeon Phi™ processor is a great step forward and provides awesome performance for molecular
simulations with GROMACS….”Eric Lindahl, University of Tennessee
* Other names and brands may be claimed as the property of others. 14
Shipping Now; Integrated Fabric Imminent
#5Cori#6 Oakforest-PACS#12Marconi#18 Theta#33 Camphor 2#106 TX-Green#375 QPACE3#397 SciPhi XVI#456 Sequana_BXI
Introducing: Intel® Xeon® Processor E5 2699A
Intel® Xeon® Processor E5-2699A v4
(22 cores, 2.40 GHz, 55 MB L3 cache)
World Record Performance
Up to 4.8% LINPACK gain vs Intel Xeon processor E5-2699
Results have been estimated or measured based on internal Intel analysis and are provided for informational purposes only. Any difference in system hardware or software design or configuration may affect actual
performance. Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific
computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully
evaluating your contemplated purchases, including the performance of that product when combined with other products. Configurations: Intel internal measurements as of September 2016. Measured see configuration P28.
For more information go to http://www.intel.com/performance/datacenter. Copyright © 2016, Intel Corporation. *Other names and brands may be claimed as the property of others. 15
Next Generation Intel® Xeon® ProcessorGeneral Availability in mid-2017
HPC Optimizations:Intel® Advanced Vector Instructions-512
boost floating point calculations & encryption algorithms
Integrated Intel® Omni-Path Architecture for high speed network
1st Live Technology Demo at the Intel Booth16
28 systems representing ~66% MSS of 100Gb systems on the Nov’16 Top500 list
1
Comparable or better performance across 24 HPC workloads (up to 9% higher)2
Up to 37% lower fabric costs on average!3
It’s been quite a year since launch at Supercomputing 2015
Intel® Omni-Path Architecture Rapid Ramp
Major wins across the globe:
17
1 Source Top500.org; *Other names and brands may be claimed as the property of others2 Configuration for WIEN2k version 14.2 lapw1c_mpi benchmark, GROMACS version 5.0.4 ion_channel benchmark, NWChem release 6.6 Siosi5 benchmark, LS-DYNA MPP R8.1.0 3cars benchmark, ANSYS Fluent v17.0 rotor_3m benchmark, NAMD 2.11 stmv benchmark, Quantum Espresso version 5.3.0 ausurf112 benchmark, CD-adapco STAR-CCM+® version 11.04.010 lemanx_poly 17m benchmark, LAMMPS Feb 16, 2016 stable version release rhodopsin protein benchmark, WRF version 3.5.1 conus2.5km benchmark, Spec MPI2007 Large suite (Intel Internal measurements marked estimates until published): Intel® Xeon® Processor E5-2697A v4 dual socket servers. 64 GB DDR4 memory per node, 2133 MHz. RHEL 7.2. BIOS settings: Snoop hold-off timer = 9, Early snoop disabled, Cluster on die disabled. IOU Non-posted prefetch disabled. Intel® Omni-Path Architecture (Intel® OPA):Intel Fabric Suite 10.0.1.0.50. Intel Corporation Device 24f0 – Series 100 HFI ASIC (Production silicon). OPA Switch: Series 100 Edge Switch – 48 port (Production silicon). EDR Infiniband: MLNX_OFED_LINUX-3.2-2.0.0.0 (OFED-3.2-2.0.0). Mellanox EDR ConnectX-4 Single Port Rev 3 MCX455A HCA. Mellanox SB7700 - 36 Port EDR Infiniband switch. Configuration for MiniFE 2.0, VASP (developer branch), GaAsBl-64 benchmark: Intel® Xeon® Processor E5-2697 v4 dual socket servers. 128 GB DDR4 memory per node, 2400 MHz. RHEL 6.5. Snoop hold-off timer = 9. Intel® OPA: Intel Fabric Suite 10.0.1.0.50. Intel Corporation Device 24f0 – Series 100 HFI ASIC (Production silicon). OPA Switch: Series 100 Edge Switch – 48 port (Production silicon). IOU Non-posted prefetch disabled. 2). Mellanox EDR based on internal measurements: Mellanox EDR ConnectX-4 Single Port Rev 3 MCX455A HCA. Mellanox SB7700 - 36 Port EDR Infiniband switch. 3 Configuration assumes full bisectional bandwidth (FBB) Fat-Tree configurations for all calculated clusters. All cluster configurations, in 12 node increments, are estimated via internal Intel configuration tool. Cost reduction scenarios described are intended as examples of how a given Intel-based product, in the specified circumstances and configurations, may affect future costs and provide cost savings. Circumstances will vary. Intel does not guarantee any costs or cost reduction. Intel® and Mellanox component pricing from www.kernelsoftware.com, with prices as of October 20, 2016.
Foster the Community
Drive Artificial Intelligence
OvercomeHPC Barriers
18
The AI Era Has Arrived
Solve Problems Unleash Discovery
Augment CapabilityReduce Tedium
DNA
DR looking at chart
19
Intel AI PortfolioMaking AI more pervasive by enabling deployment-ready AI solutions through a large, open ecosystem
Intel® Math Kernel Library (Intel® MKL &
MKL-DNN)
Intel® Data Analytics Acceleration Library
(Intel® DAAL)
+Network
+Memory
+StorageDatacenter Endpoint
Solution blueprintsfor reference across industries
Tools/Platformsto accelerate deployment of IA solution stack
Optimized Open Frameworksthat scale to multi-node and deliver best performance
Free Libraries/Languagesfeaturing optimized ML/DL building blocks to enable developers
Best in class hardwareCross compatible portfolio spanning from data center to edge
delivering high perf, perf/TCO, perf/w
Intel Deep Learning SDK
Intel Distribution for Python
Other names and brands may be claimed as the property of others. 20
Rick L. StevensAssociate Laboratory DirectorComputing, Environment and Life SciencesArgonne National LaboratoryProfessor of Computer ScienceUniversity of Chicago
Machine Learning for Science and Health
Hunting for Dark Matter
New Treatments for Cancer
Developing New Antibiotics
Searching for New Particles
Next Gen Intel® Xeon PhiTM ProcessorCodenamed “Knights Mill”
Optimized for Artificial Intelligence
Host-CPU with mixed precision performance for improved machine learning
Coming in 2017
21
Intel® Deep Learning Inference Accelerator
• Simplify deployment with
preloaded, optimized algorithms
• PCIe card with Intel® Arria® 10 FPGA
• Software programmable through
standard frameworks and libraries
• Coming in 2017
Integrated hardware and software solution to accelerate convolutional neural networks
22
Learn More about Intel and AI
Intel AI Day: Thursday Nov 17th, San Francisco
Brian Krzanich, Intel CEO
Diane Bryant, EVP and GM Data Center Group
Doug Fisher, SVP and GM Software Services Group
Doug Davis, SVP and GM Internet of Things Group
Topics include the transformative potential of AI, Intel's vision to fuel the AI computing era, and the great social responsibility associated with AI.
23
Foster the Community
Drive Artificial Intelligence
OvercomeHPC Barriers
24
Foster the CommunityIntel Driven &
Supported Trainings
~2.5 Milliondevelopers trained
Community Forums
Industry leaders sharing insights and BKMs
Intel® Parallel Computing Centers
>80 global centers;>150 codes modernized
25
Configuration Summary E5-2699A
30
• Up to 4.8% more floating point operations on MP LINPACK workload when comparing 1-Node, 2 x Intel® Xeon® Processor E5-2699 v4 on Grantley-EP (Wellsburg) with 64 GB Total Memory on Red Hat Enterprise Linux* 7.0 kernel 3.10.0-123 using MP_LINPACK 11.3.1 (Composer XE 2016 U1). Data Source: Request Number: 1636, Benchmark: Intel® Optimized MP LINPACK, Score: 1446.4 to 1-Node, 2 x Intel® Xeon® Processor E5-2699A v4 on Grantley-EP (Wellsburg) with 64 GB Total Memory on Red Hat Enterprise Linux* 7.2-kernel 3.10.0-327. Data Source: Request Number: 2460, Benchmark: Intel® Optimized MP LINPACK, Score: 1516.02 Higher is better
Intel Confidential - CNDA Required28