Lecture1 - Parallel Computing Architecturesece.uprm.edu/~wrivera/ICOM6025/Lecture1.pdf · SISD SIMD...
Transcript of Lecture1 - Parallel Computing Architecturesece.uprm.edu/~wrivera/ICOM6025/Lecture1.pdf · SISD SIMD...
![Page 1: Lecture1 - Parallel Computing Architecturesece.uprm.edu/~wrivera/ICOM6025/Lecture1.pdf · SISD SIMD MISD MIMD Data ions ICOM 6025: High Performance Computing 7. Flynn's Taxonomy •Single](https://reader030.fdocuments.net/reader030/viewer/2022040823/5e6dbb504c702d7ca72511b7/html5/thumbnails/1.jpg)
Dr. Wilson Rivera
ICOM 6025: High Performance ComputingElectrical and Computer Engineering Department
University of Puerto Rico
Lecture 1Parallel Computing Architectures
![Page 2: Lecture1 - Parallel Computing Architecturesece.uprm.edu/~wrivera/ICOM6025/Lecture1.pdf · SISD SIMD MISD MIMD Data ions ICOM 6025: High Performance Computing 7. Flynn's Taxonomy •Single](https://reader030.fdocuments.net/reader030/viewer/2022040823/5e6dbb504c702d7ca72511b7/html5/thumbnails/2.jpg)
• Goal: Understand parallel computing fundamental concepts – HPC challenges– Flynn’s Taxonomy– Memory Access Models– Multi-core Processors– Graphics Processor Units– Cluster Infrastructures– Cloud Infrastructures
Outline
ICOM 6025: High Performance Computing 2
![Page 3: Lecture1 - Parallel Computing Architecturesece.uprm.edu/~wrivera/ICOM6025/Lecture1.pdf · SISD SIMD MISD MIMD Data ions ICOM 6025: High Performance Computing 7. Flynn's Taxonomy •Single](https://reader030.fdocuments.net/reader030/viewer/2022040823/5e6dbb504c702d7ca72511b7/html5/thumbnails/3.jpg)
Optimization of plasma heating systems for fusion experiments
Physics of high-temperature superconducting cuprates
Global simulation of CO2 dynamics
HPC Challenges
Fundamental instabilityof supernova shocks
Protein structure and function for cellulose-to-ethanol conversion
Next-generation combustion devices burning alternative fuels
Slide source: Thomas Zaharia
![Page 4: Lecture1 - Parallel Computing Architecturesece.uprm.edu/~wrivera/ICOM6025/Lecture1.pdf · SISD SIMD MISD MIMD Data ions ICOM 6025: High Performance Computing 7. Flynn's Taxonomy •Single](https://reader030.fdocuments.net/reader030/viewer/2022040823/5e6dbb504c702d7ca72511b7/html5/thumbnails/4.jpg)
1980 1990 2000 2010 2020 2030
Capacity: # of Overnight
Loads cases run
Available Computational
Capacity [Flop/s]
CFD-basedLOADS
& HQ
Aero Optimisation& CFD-CSM Full MDO
Real timeCFD based
in flightsimulation
x106
1 Zeta (1021)
1 Peta (1015)
1 Tera (1012)
1 Giga (109)
1 Exa (1018)
102
103
104
105
106
LES
CFD-basednoise
simulation
RANS Low Speed
RANS High Speed
HS Design
Data Set
UnsteadyRANS
“Smart” use of HPC power:• Algorithms• Data mining• knowledge
Capability achieved during one night batchCourtesy AIRBUS France
HPC Challenges
![Page 5: Lecture1 - Parallel Computing Architecturesece.uprm.edu/~wrivera/ICOM6025/Lecture1.pdf · SISD SIMD MISD MIMD Data ions ICOM 6025: High Performance Computing 7. Flynn's Taxonomy •Single](https://reader030.fdocuments.net/reader030/viewer/2022040823/5e6dbb504c702d7ca72511b7/html5/thumbnails/5.jpg)
ICOM 6025: High Performance Computing 5
High Resolution Climate Modeling on NERSC-3 – P. Duffy, et al., LLNL
HPC Challenges
![Page 6: Lecture1 - Parallel Computing Architecturesece.uprm.edu/~wrivera/ICOM6025/Lecture1.pdf · SISD SIMD MISD MIMD Data ions ICOM 6025: High Performance Computing 7. Flynn's Taxonomy •Single](https://reader030.fdocuments.net/reader030/viewer/2022040823/5e6dbb504c702d7ca72511b7/html5/thumbnails/6.jpg)
HPC Challenges
ICOM 6025: High Performance Computing 6
https://computation.llnl.gov/casc/projects/.../climate_2007F.pdf
![Page 7: Lecture1 - Parallel Computing Architecturesece.uprm.edu/~wrivera/ICOM6025/Lecture1.pdf · SISD SIMD MISD MIMD Data ions ICOM 6025: High Performance Computing 7. Flynn's Taxonomy •Single](https://reader030.fdocuments.net/reader030/viewer/2022040823/5e6dbb504c702d7ca72511b7/html5/thumbnails/7.jpg)
Flynn's Taxonomy
SISD SIMD
MISD MIMD
Data
Instructions
ICOM 6025: High Performance Computing 7
![Page 8: Lecture1 - Parallel Computing Architecturesece.uprm.edu/~wrivera/ICOM6025/Lecture1.pdf · SISD SIMD MISD MIMD Data ions ICOM 6025: High Performance Computing 7. Flynn's Taxonomy •Single](https://reader030.fdocuments.net/reader030/viewer/2022040823/5e6dbb504c702d7ca72511b7/html5/thumbnails/8.jpg)
Flynn's Taxonomy
•Single Instruction, Multiple Data (SIMD)
– All processing units execute the same instruction at any given clock cycle
– Best suited for high degree of regularity • Image processing
– Good examples• SSE = Streaming SIMD Extensions• SSE, SSE2, Intel MIC (Xeon Phi)• Graphics Processing Units (GPU)
ICOM 6025: High Performance Computing 8
![Page 9: Lecture1 - Parallel Computing Architecturesece.uprm.edu/~wrivera/ICOM6025/Lecture1.pdf · SISD SIMD MISD MIMD Data ions ICOM 6025: High Performance Computing 7. Flynn's Taxonomy •Single](https://reader030.fdocuments.net/reader030/viewer/2022040823/5e6dbb504c702d7ca72511b7/html5/thumbnails/9.jpg)
Flynn's Taxonomy
• Multiple Instruction, Multiple Data (MIMD)
– Every processing unit may be executing a different instruction stream, and working with a different data stream. • Clusters, and multicore computers. • In practice MIMD architectures may also include
SIMD execution sub-components.
ICOM 6025: High Performance Computing 9
![Page 10: Lecture1 - Parallel Computing Architecturesece.uprm.edu/~wrivera/ICOM6025/Lecture1.pdf · SISD SIMD MISD MIMD Data ions ICOM 6025: High Performance Computing 7. Flynn's Taxonomy •Single](https://reader030.fdocuments.net/reader030/viewer/2022040823/5e6dbb504c702d7ca72511b7/html5/thumbnails/10.jpg)
Memory Access Models
• Shared memory• Distributed memory• Hybrid Distributed-Shared Memory
ICOM 6025: High Performance Computing 10
![Page 11: Lecture1 - Parallel Computing Architecturesece.uprm.edu/~wrivera/ICOM6025/Lecture1.pdf · SISD SIMD MISD MIMD Data ions ICOM 6025: High Performance Computing 7. Flynn's Taxonomy •Single](https://reader030.fdocuments.net/reader030/viewer/2022040823/5e6dbb504c702d7ca72511b7/html5/thumbnails/11.jpg)
Shared Memory
Bus Interconnect
Memory
CPU CPU CPU
L2 L2 L2
I/O
ICOM 6025: High Performance Computing 11
![Page 12: Lecture1 - Parallel Computing Architecturesece.uprm.edu/~wrivera/ICOM6025/Lecture1.pdf · SISD SIMD MISD MIMD Data ions ICOM 6025: High Performance Computing 7. Flynn's Taxonomy •Single](https://reader030.fdocuments.net/reader030/viewer/2022040823/5e6dbb504c702d7ca72511b7/html5/thumbnails/12.jpg)
Shared Memory
• multiple processors can operate independently but share the same memory resources – so that changes in a memory location effected by one processor
are visible to all other processors.
• Two main classes based upon memory access times– Uniform Memory Access (UMA)
• Symmetric Multi Processors (SMPs)– Non Uniform Memory Access (NUMA)
• Main disadvantage is the lack of scalability between memory and CPUs. – Adding more CPUs geometrically increases traffic on the shared
memory CPU path
ICOM 6025: High Performance Computing 12
![Page 13: Lecture1 - Parallel Computing Architecturesece.uprm.edu/~wrivera/ICOM6025/Lecture1.pdf · SISD SIMD MISD MIMD Data ions ICOM 6025: High Performance Computing 7. Flynn's Taxonomy •Single](https://reader030.fdocuments.net/reader030/viewer/2022040823/5e6dbb504c702d7ca72511b7/html5/thumbnails/13.jpg)
Shared Memory
• Memory hierarchy tries to exploit locality – Cache hit: in cache memory access (cheap)– Cache miss: non-cache memory access
(expensive)
ICOM 6025: High Performance Computing 13
![Page 14: Lecture1 - Parallel Computing Architecturesece.uprm.edu/~wrivera/ICOM6025/Lecture1.pdf · SISD SIMD MISD MIMD Data ions ICOM 6025: High Performance Computing 7. Flynn's Taxonomy •Single](https://reader030.fdocuments.net/reader030/viewer/2022040823/5e6dbb504c702d7ca72511b7/html5/thumbnails/14.jpg)
Distributed Memory
Network I/O
CPU
L2
M
L2
CPU M
CPU
L2
M
L2
CPU M
ICOM 6025: High Performance Computing 14
![Page 15: Lecture1 - Parallel Computing Architecturesece.uprm.edu/~wrivera/ICOM6025/Lecture1.pdf · SISD SIMD MISD MIMD Data ions ICOM 6025: High Performance Computing 7. Flynn's Taxonomy •Single](https://reader030.fdocuments.net/reader030/viewer/2022040823/5e6dbb504c702d7ca72511b7/html5/thumbnails/15.jpg)
Distributed Memory
• Processors have their own local memory.• When a processor needs access to data in
another processor– it is usually the task of the programmer to
explicitly define how and when data is communicated
• Examples: Cray XT4, Clusters, Cloud
ICOM 6025: High Performance Computing 15
![Page 16: Lecture1 - Parallel Computing Architecturesece.uprm.edu/~wrivera/ICOM6025/Lecture1.pdf · SISD SIMD MISD MIMD Data ions ICOM 6025: High Performance Computing 7. Flynn's Taxonomy •Single](https://reader030.fdocuments.net/reader030/viewer/2022040823/5e6dbb504c702d7ca72511b7/html5/thumbnails/16.jpg)
Hybrid (Distributed-Shared) Memory
ICOM 6025: High Performance Computing 16
Shared memory
Shared memory
Shared memory
Shared memory
NETWORK
In practice we have hybrid memory access
![Page 17: Lecture1 - Parallel Computing Architecturesece.uprm.edu/~wrivera/ICOM6025/Lecture1.pdf · SISD SIMD MISD MIMD Data ions ICOM 6025: High Performance Computing 7. Flynn's Taxonomy •Single](https://reader030.fdocuments.net/reader030/viewer/2022040823/5e6dbb504c702d7ca72511b7/html5/thumbnails/17.jpg)
Parallel computing trends• Multi-core processors
– Instead of building processors with faster clock speeds, modern computer systems are being built using chips with an increasing number of processor cores
• Graphics Processor Unit (GPU) – General purpose computing and in particular data parallel high
performance computing
• Dynamic approach to cluster computing provisioning. – Instead of offering a fixed software environment, the application
provides information to the scheduler about what type of resources it needs, and the nodes are automatically provisioned for the user at run-time.
• Platform ISF Adaptive Cluster • Moab Adaptive Operating Environment
• Large scale commodity computer data centers (cloud)– Amazon EC2, Eucalyptus, Google App Engine
ICOM 6025: High Performance Computing 17
![Page 18: Lecture1 - Parallel Computing Architecturesece.uprm.edu/~wrivera/ICOM6025/Lecture1.pdf · SISD SIMD MISD MIMD Data ions ICOM 6025: High Performance Computing 7. Flynn's Taxonomy •Single](https://reader030.fdocuments.net/reader030/viewer/2022040823/5e6dbb504c702d7ca72511b7/html5/thumbnails/18.jpg)
Multi-cores and Moore’s Law
Circuits complexity doubles every 18 months
ICOM 6025: High Performance Computing 18
Power wall (2004)
Source: The National Academies Press, Washington, DC, 2011
Source: Intel
![Page 19: Lecture1 - Parallel Computing Architecturesece.uprm.edu/~wrivera/ICOM6025/Lecture1.pdf · SISD SIMD MISD MIMD Data ions ICOM 6025: High Performance Computing 7. Flynn's Taxonomy •Single](https://reader030.fdocuments.net/reader030/viewer/2022040823/5e6dbb504c702d7ca72511b7/html5/thumbnails/19.jpg)
Power Wall
• The transition to multi-core processors is not a breakthrough in architecture, but it is actually a result from the need of building power efficient chips
ICOM 6025: High Performance Computing 19
![Page 20: Lecture1 - Parallel Computing Architecturesece.uprm.edu/~wrivera/ICOM6025/Lecture1.pdf · SISD SIMD MISD MIMD Data ions ICOM 6025: High Performance Computing 7. Flynn's Taxonomy •Single](https://reader030.fdocuments.net/reader030/viewer/2022040823/5e6dbb504c702d7ca72511b7/html5/thumbnails/20.jpg)
Power Density Limits Serial Performance
ICOM 6025: High Performance Computing 20
![Page 21: Lecture1 - Parallel Computing Architecturesece.uprm.edu/~wrivera/ICOM6025/Lecture1.pdf · SISD SIMD MISD MIMD Data ions ICOM 6025: High Performance Computing 7. Flynn's Taxonomy •Single](https://reader030.fdocuments.net/reader030/viewer/2022040823/5e6dbb504c702d7ca72511b7/html5/thumbnails/21.jpg)
Many-cores (Graphics Processor Units)
• Graphics Processor Units (GPUs)
– throughput oriented devices designed to provide high aggregate performance for independent computations.
• prioritizing high-throughput processing of many parallel operations over the low-latency execution of a single task.
– GPUs do not use independent instruction decoders
• instead groups of processing units share an instruction decoder; this maximizes the number of arithmetic units per die area
ICOM 6025: High Performance Computing 21
![Page 22: Lecture1 - Parallel Computing Architecturesece.uprm.edu/~wrivera/ICOM6025/Lecture1.pdf · SISD SIMD MISD MIMD Data ions ICOM 6025: High Performance Computing 7. Flynn's Taxonomy •Single](https://reader030.fdocuments.net/reader030/viewer/2022040823/5e6dbb504c702d7ca72511b7/html5/thumbnails/22.jpg)
Multi-Core vs. Many-Core
• Multi-core processors (minimize latency)– MIMD– Each core optimized for executing a single thread– Lots of big on-chip caches– Extremely sophisticated control
• Many-core processors (maximize throughput)– SIMD– Cores optimized for aggregating throughput– Lots of ALUs– Simpler control
ICOM 6025: High Performance Computing 22
![Page 23: Lecture1 - Parallel Computing Architecturesece.uprm.edu/~wrivera/ICOM6025/Lecture1.pdf · SISD SIMD MISD MIMD Data ions ICOM 6025: High Performance Computing 7. Flynn's Taxonomy •Single](https://reader030.fdocuments.net/reader030/viewer/2022040823/5e6dbb504c702d7ca72511b7/html5/thumbnails/23.jpg)
CPUs: Latency Oriented Design
• Large caches– Convert long latency memory
accesses to short latency cache accesses
• Sophisticated control– Branch prediction for
reduced branch latency– Data forwarding for reduced
data latency• Powerful ALU
– Reduced operation latency © David Kirk/NVIDIA and Wen-mei W. Hwu, 2007-2012, SSL 2014, ECE408/CS483, University of Illinois, Urbana-Champaign
23
Cache
ALUControl
ALU
ALU
ALU
DRAM
![Page 24: Lecture1 - Parallel Computing Architecturesece.uprm.edu/~wrivera/ICOM6025/Lecture1.pdf · SISD SIMD MISD MIMD Data ions ICOM 6025: High Performance Computing 7. Flynn's Taxonomy •Single](https://reader030.fdocuments.net/reader030/viewer/2022040823/5e6dbb504c702d7ca72511b7/html5/thumbnails/24.jpg)
GPUs: Throughput Oriented Design
• Small caches– To boost memory throughput
• Simple control– No branch prediction– No data forwarding
• Energy efficient ALUs– Many, long latency but heavily
pipelined for high throughput• Require massive number of
threads to tolerate latencies© David Kirk/NVIDIA and Wen-mei W. Hwu, 2007-2012, SSL 2014, ECE408/CS483, University of Illinois, Urbana-Champaign
24
DRAM
GPU
![Page 25: Lecture1 - Parallel Computing Architecturesece.uprm.edu/~wrivera/ICOM6025/Lecture1.pdf · SISD SIMD MISD MIMD Data ions ICOM 6025: High Performance Computing 7. Flynn's Taxonomy •Single](https://reader030.fdocuments.net/reader030/viewer/2022040823/5e6dbb504c702d7ca72511b7/html5/thumbnails/25.jpg)
0
200
400
600
800
1000
1200
1400
9/22/2002 2/4/2004 6/18/2005 10/31/2006 3/14/2008 7/27/2009
GFLO
Ps
NVIDIAGPUIntelCPU
Multi-Core vs. Many-Core
T12
WestmereNV30 NV40
G70
G80
GT200
3GHz Dual Core P4
3GHz Core2 Duo
3GHz Xeon Quad
ICOM 6025: High Performance Computing 25
![Page 26: Lecture1 - Parallel Computing Architecturesece.uprm.edu/~wrivera/ICOM6025/Lecture1.pdf · SISD SIMD MISD MIMD Data ions ICOM 6025: High Performance Computing 7. Flynn's Taxonomy •Single](https://reader030.fdocuments.net/reader030/viewer/2022040823/5e6dbb504c702d7ca72511b7/html5/thumbnails/26.jpg)
Intel® Xeon® Processor E7-8894 v4
• 24 cores• 48 threads• 2.40 GHz• 14 nm• 60MB cache• $8k (July 2017)
26
![Page 27: Lecture1 - Parallel Computing Architecturesece.uprm.edu/~wrivera/ICOM6025/Lecture1.pdf · SISD SIMD MISD MIMD Data ions ICOM 6025: High Performance Computing 7. Flynn's Taxonomy •Single](https://reader030.fdocuments.net/reader030/viewer/2022040823/5e6dbb504c702d7ca72511b7/html5/thumbnails/27.jpg)
NVIDIA TITAN Xp
• 3840 cores• 1.6 GHz• Pascal Architecture• Peak = 12TF/s• $1.5K
ICOM 6025: High Performance Computing 27
![Page 28: Lecture1 - Parallel Computing Architecturesece.uprm.edu/~wrivera/ICOM6025/Lecture1.pdf · SISD SIMD MISD MIMD Data ions ICOM 6025: High Performance Computing 7. Flynn's Taxonomy •Single](https://reader030.fdocuments.net/reader030/viewer/2022040823/5e6dbb504c702d7ca72511b7/html5/thumbnails/28.jpg)
Cluster Hardware Configuration
Head Node
Node 1
Node 2
Node n
Switch Local StorageExternal Storage
ICOM 6025: High Performance Computing 28
© Wilson Rivera
![Page 29: Lecture1 - Parallel Computing Architecturesece.uprm.edu/~wrivera/ICOM6025/Lecture1.pdf · SISD SIMD MISD MIMD Data ions ICOM 6025: High Performance Computing 7. Flynn's Taxonomy •Single](https://reader030.fdocuments.net/reader030/viewer/2022040823/5e6dbb504c702d7ca72511b7/html5/thumbnails/29.jpg)
Cluster Head Node
• Head Node – Network interface cards (NIC): one connecting to
the public network and the other one connecting to the internal cluster network.
– A local storage is attached to the head node for administrative purposes such as accounting management and maintenance services
ICOM 6025: High Performance Computing 29
![Page 30: Lecture1 - Parallel Computing Architecturesece.uprm.edu/~wrivera/ICOM6025/Lecture1.pdf · SISD SIMD MISD MIMD Data ions ICOM 6025: High Performance Computing 7. Flynn's Taxonomy •Single](https://reader030.fdocuments.net/reader030/viewer/2022040823/5e6dbb504c702d7ca72511b7/html5/thumbnails/30.jpg)
Cluster Interconnection Network
• The interconnection of the cluster depends upon both application and budget constraints. – Small clusters typically have PC based nodes connected
through a Gigabit Ethernet network– Large scale production clusters may be made of 1U or 2U
servers or blade servers connected through either • A Gigabit Ethernet network (Server Farm), or • A high performance computing network (High Performance
Computing Cluster)– Infiniband– Quadrics– Myrinet – Omni-Path (Intel)
ICOM 6025: High Performance Computing 30
![Page 31: Lecture1 - Parallel Computing Architecturesece.uprm.edu/~wrivera/ICOM6025/Lecture1.pdf · SISD SIMD MISD MIMD Data ions ICOM 6025: High Performance Computing 7. Flynn's Taxonomy •Single](https://reader030.fdocuments.net/reader030/viewer/2022040823/5e6dbb504c702d7ca72511b7/html5/thumbnails/31.jpg)
Cluster Storage
•Storage Area Network (SAN)– Storage devices appear as locally attached to the
operating system.
•Network Attached Storage (NAS)– Distributed File-based protocols
• Parallel Virtual File System (PVFS)• General Parallel File System (GPFS)• Hadoop Parallel File System (HPFS)• Lustre• CERN-VM-FS
ICOM 6025: High Performance Computing 31
![Page 32: Lecture1 - Parallel Computing Architecturesece.uprm.edu/~wrivera/ICOM6025/Lecture1.pdf · SISD SIMD MISD MIMD Data ions ICOM 6025: High Performance Computing 7. Flynn's Taxonomy •Single](https://reader030.fdocuments.net/reader030/viewer/2022040823/5e6dbb504c702d7ca72511b7/html5/thumbnails/32.jpg)
Cluster Software
Operating System Cluster Infrastructure Services
Cluster Tools and Libraries
Cluster Resource Manager Scheduler Monitor Analyzer
Communication Compiler Optimization
ICOM 6025: High Performance Computing 32
© Wilson Rivera
![Page 33: Lecture1 - Parallel Computing Architecturesece.uprm.edu/~wrivera/ICOM6025/Lecture1.pdf · SISD SIMD MISD MIMD Data ions ICOM 6025: High Performance Computing 7. Flynn's Taxonomy •Single](https://reader030.fdocuments.net/reader030/viewer/2022040823/5e6dbb504c702d7ca72511b7/html5/thumbnails/33.jpg)
Top500.org
ICOM 6025: High Performance Computing 33
![Page 34: Lecture1 - Parallel Computing Architecturesece.uprm.edu/~wrivera/ICOM6025/Lecture1.pdf · SISD SIMD MISD MIMD Data ions ICOM 6025: High Performance Computing 7. Flynn's Taxonomy •Single](https://reader030.fdocuments.net/reader030/viewer/2022040823/5e6dbb504c702d7ca72511b7/html5/thumbnails/34.jpg)
History of Performance
ICOM 6025: High Performance Computing 34Exascale Computing and Big Data
![Page 35: Lecture1 - Parallel Computing Architecturesece.uprm.edu/~wrivera/ICOM6025/Lecture1.pdf · SISD SIMD MISD MIMD Data ions ICOM 6025: High Performance Computing 7. Flynn's Taxonomy •Single](https://reader030.fdocuments.net/reader030/viewer/2022040823/5e6dbb504c702d7ca72511b7/html5/thumbnails/35.jpg)
1 Gflop/s
1 Tflop/s
100 Mflop/s
100 Gflop/s
100 Tflop/s
10 Gflop/s
10 Tflop/s
1 Pflop/s
100 Pflop/s
10 Pflop/s SUM
N=1
N=500
Projected Performance
ICOM 6025: High Performance Computing 35
![Page 36: Lecture1 - Parallel Computing Architecturesece.uprm.edu/~wrivera/ICOM6025/Lecture1.pdf · SISD SIMD MISD MIMD Data ions ICOM 6025: High Performance Computing 7. Flynn's Taxonomy •Single](https://reader030.fdocuments.net/reader030/viewer/2022040823/5e6dbb504c702d7ca72511b7/html5/thumbnails/36.jpg)
#1 TAIHULIGHT @ CHINA
• June 2017• National
Supercomputing Center in Wuxi
• SW26010 processors developed by NRCPC
• 40,960 nodes• 10,649,600 cores• Peak =125 PF/s• R max =93 PF/s• 15,371 kW
ICOM 6025: High Performance Computing 36
![Page 37: Lecture1 - Parallel Computing Architecturesece.uprm.edu/~wrivera/ICOM6025/Lecture1.pdf · SISD SIMD MISD MIMD Data ions ICOM 6025: High Performance Computing 7. Flynn's Taxonomy •Single](https://reader030.fdocuments.net/reader030/viewer/2022040823/5e6dbb504c702d7ca72511b7/html5/thumbnails/37.jpg)
Cloud Computing
• Cloud computing allows scaling on demand without building or provisioning a data center– Computing resources available on demand (self service) – Charging only for resources utilized (Pay-as-you-go)
• Worldwide revenue from public IT cloud services exceeded $21.5 billion in 2010– It will reach $72.9 billion in 2015– compound annual growth rate (CAGR) of 27.6%.
http://www.idc.com/prodserv/idc_cloud.jsp
![Page 38: Lecture1 - Parallel Computing Architecturesece.uprm.edu/~wrivera/ICOM6025/Lecture1.pdf · SISD SIMD MISD MIMD Data ions ICOM 6025: High Performance Computing 7. Flynn's Taxonomy •Single](https://reader030.fdocuments.net/reader030/viewer/2022040823/5e6dbb504c702d7ca72511b7/html5/thumbnails/38.jpg)
Cloud versus Grid
• Grids– Sharing and coordination of distributed
resources– Grid Middleware
• Globus, UNICORE, Glite• Clouds
– Leverages on virtualization to maximize resource utilization
– Cloud Middleware• IaaS, PaaS, SaaS
ICOM 6025: High Performance Computing 38
![Page 39: Lecture1 - Parallel Computing Architecturesece.uprm.edu/~wrivera/ICOM6025/Lecture1.pdf · SISD SIMD MISD MIMD Data ions ICOM 6025: High Performance Computing 7. Flynn's Taxonomy •Single](https://reader030.fdocuments.net/reader030/viewer/2022040823/5e6dbb504c702d7ca72511b7/html5/thumbnails/39.jpg)
Layered cloud model
From: K ChenWright University
![Page 40: Lecture1 - Parallel Computing Architecturesece.uprm.edu/~wrivera/ICOM6025/Lecture1.pdf · SISD SIMD MISD MIMD Data ions ICOM 6025: High Performance Computing 7. Flynn's Taxonomy •Single](https://reader030.fdocuments.net/reader030/viewer/2022040823/5e6dbb504c702d7ca72511b7/html5/thumbnails/40.jpg)
Cloud Layers
– Infrastructure as a Service (IaaS)• Flexible in terms of the applications to be hosted • Amazon EC2, RackSpace, Nimbus, Eucalyptus
– Platform as a Service (PaaS)• Application domain-specific platforms• Google App Engine, MS Azure, Heroku
– Software as a Service (SaaS)• Service domain-specific• Salesforce, NetSuite
ICOM 6025: High Performance Computing 40
![Page 41: Lecture1 - Parallel Computing Architecturesece.uprm.edu/~wrivera/ICOM6025/Lecture1.pdf · SISD SIMD MISD MIMD Data ions ICOM 6025: High Performance Computing 7. Flynn's Taxonomy •Single](https://reader030.fdocuments.net/reader030/viewer/2022040823/5e6dbb504c702d7ca72511b7/html5/thumbnails/41.jpg)
Unused resources
Cloud Economics• Pay by use instead of provisioning for peak
Static data center Data center in the cloud
Demand
Capacity
Time
Res
ourc
es
Demand
Capacity
TimeR
esou
rces
41
From: K ChenWright University
![Page 42: Lecture1 - Parallel Computing Architecturesece.uprm.edu/~wrivera/ICOM6025/Lecture1.pdf · SISD SIMD MISD MIMD Data ions ICOM 6025: High Performance Computing 7. Flynn's Taxonomy •Single](https://reader030.fdocuments.net/reader030/viewer/2022040823/5e6dbb504c702d7ca72511b7/html5/thumbnails/42.jpg)
Cloud Economics• Setup:
– A peak period needs 10 servers to process requests– Assume your service is going to run for 1 year
• Private cluster: one-time investment– Servers $1,500 x 10 = $15,000 – Power/AC costs about $200/year/server => $2,000– Administrator: $50,000
• Public cloud:– Rush hours: 10 hours/day, which needs 10 nodes/hour– Other hours: 14hours need 2 nodes/hour– Total: 128 hour.nodes x $0.10/hour.node =$12.80/day– One year cost = $4,672
![Page 43: Lecture1 - Parallel Computing Architecturesece.uprm.edu/~wrivera/ICOM6025/Lecture1.pdf · SISD SIMD MISD MIMD Data ions ICOM 6025: High Performance Computing 7. Flynn's Taxonomy •Single](https://reader030.fdocuments.net/reader030/viewer/2022040823/5e6dbb504c702d7ca72511b7/html5/thumbnails/43.jpg)
Cloud Economics
• Amazon EC2 Pricing • Google engine pricing• Hadoop Sizing•• How much to rent a supercomputer
– 8-core VM– 30 GB of RAM (each core 3.75GB)– $1.16/hour – 600,000 cores– 75,000 VMs– $87,000/hour – $2 million per day
ICOM 6025: High Performance Computing 43
![Page 44: Lecture1 - Parallel Computing Architecturesece.uprm.edu/~wrivera/ICOM6025/Lecture1.pdf · SISD SIMD MISD MIMD Data ions ICOM 6025: High Performance Computing 7. Flynn's Taxonomy •Single](https://reader030.fdocuments.net/reader030/viewer/2022040823/5e6dbb504c702d7ca72511b7/html5/thumbnails/44.jpg)
Data analytics Ecosystem
ICOM 6025: High Performance Computing 44Exascale Computing and Big Data
![Page 45: Lecture1 - Parallel Computing Architecturesece.uprm.edu/~wrivera/ICOM6025/Lecture1.pdf · SISD SIMD MISD MIMD Data ions ICOM 6025: High Performance Computing 7. Flynn's Taxonomy •Single](https://reader030.fdocuments.net/reader030/viewer/2022040823/5e6dbb504c702d7ca72511b7/html5/thumbnails/45.jpg)
Summary
– Parallel computing infrastructure trends• Multi-core Processors
– As a result from the need of building power efficient chips.• Graphics Processor Units
– Throughput oriented devices designed to provide high aggregate performance for independent computations
• Cluster Infrastructures– Head; interconnection; storage; software
• Cloud Infrastructures– Physical resources; virtual resources; infrastructure
services; application services
ICOM 6025: High Performance Computing 45
![Page 46: Lecture1 - Parallel Computing Architecturesece.uprm.edu/~wrivera/ICOM6025/Lecture1.pdf · SISD SIMD MISD MIMD Data ions ICOM 6025: High Performance Computing 7. Flynn's Taxonomy •Single](https://reader030.fdocuments.net/reader030/viewer/2022040823/5e6dbb504c702d7ca72511b7/html5/thumbnails/46.jpg)
Scientific Computing Terminology
Terms
• HPC System• Interconnect• Node (blade,
sled, etc.)• Chassis
Definition• “High Performance Computing”
(HPC) Computer– Computers Connected through high speed interconnect and configured for scientific computing.
• The wiring, chips, and software that connects computing components.
• An independent computing unit of an HPC System. Unit has its own operating system (OS) and memory. The physical cases of a node are often called blades and sleds.
• Nodes are often aggregated into a chassis (with a backplane) for sharing electrical power, cooling and sharing a local interconnect.
![Page 47: Lecture1 - Parallel Computing Architecturesece.uprm.edu/~wrivera/ICOM6025/Lecture1.pdf · SISD SIMD MISD MIMD Data ions ICOM 6025: High Performance Computing 7. Flynn's Taxonomy •Single](https://reader030.fdocuments.net/reader030/viewer/2022040823/5e6dbb504c702d7ca72511b7/html5/thumbnails/47.jpg)
Terminology (continued)
• Chip or Die• Socket• CPU (or
processor?)• Core• Hyper-
Threading
• Self-contained circuits on a single media of size ~20mm x 20mm, containing up to ~1 billion transistors.
• Provides a connection between and chip and a motherboard.
• A Central Processor Unit, consisting of a chip or die (often called a processor).
• Modern CPUs contain multiple cores. A core is an execution unit within that can execute a code’s instructions independently while other cores execute a different code’s instructions.
• A single core can have additional circuitry that allows two or more instruction streams (threads) to proceed through a single core “simultaneously”. Hyper-Thread is an Intel trademark for 2 threads. Xeon Phi Coprocessor supports 4 threads.
Terms Definition