金仲達 國立清華大學資訊工程學系 [email protected] Cluster Computing: An...

64
金金金 金金金金金金金金金金金金 [email protected] Cluster Computing: An Introduction
  • date post

    20-Dec-2015
  • Category

    Documents

  • view

    249
  • download

    7

Transcript of 金仲達 國立清華大學資訊工程學系 [email protected] Cluster Computing: An...

Page 1: 金仲達 國立清華大學資訊工程學系 king@cs.nthu.edu.tw Cluster Computing: An Introduction.

金仲達國立清華大學資訊工程學系

[email protected]

Cluster Computing:

An Introduction

Page 2: 金仲達 國立清華大學資訊工程學系 king@cs.nthu.edu.tw Cluster Computing: An Introduction.

2

Clusters Have Arrived

Page 3: 金仲達 國立清華大學資訊工程學系 king@cs.nthu.edu.tw Cluster Computing: An Introduction.

3

What is a Cluster?

A collection of independent computer systems working together as if a single system

Coupled through a scalable, high bandwidth, low latency interconnect

The nodes can exist in a single cabinet or be separated and connected via a network

Faster, closer connection than a network (LAN) Looser connection than a symmetric multiprocessor

Page 4: 金仲達 國立清華大學資訊工程學系 king@cs.nthu.edu.tw Cluster Computing: An Introduction.

4

Outline

Motivations of Cluster Computing Cluster Classifications Cluster Architecture & its Components Cluster Middleware Representative Cluster Systems Task Forces on Cluster Resources and Conclusions

Page 5: 金仲達 國立清華大學資訊工程學系 king@cs.nthu.edu.tw Cluster Computing: An Introduction.

5

Motivations of

Cluster Computing

Page 6: 金仲達 國立清華大學資訊工程學系 king@cs.nthu.edu.tw Cluster Computing: An Introduction.

6

How to Run Applications Faster ?

There are three ways to improve performance: Work harder Work smarter Get help

Computer analogy Use faster hardware: e.g. reduce the time per

instruction (clock cycle) Optimized algorithms and techniques Multiple computers to solve problem

=> techniques of parallel processing is mature and can be exploited commercially

Page 7: 金仲達 國立清華大學資訊工程學系 king@cs.nthu.edu.tw Cluster Computing: An Introduction.

7

Motivation for Using Clusters

Performance of workstations and PCs is rapidly improving

Communications bandwidth between computers is increasing

Vast numbers of under-utilized workstations with a huge number of unused processor cycles

Organizations are reluctant to buy large, high performance computers, due to the high cost and short useful life span

Page 8: 金仲達 國立清華大學資訊工程學系 king@cs.nthu.edu.tw Cluster Computing: An Introduction.

8

Motivation for Using Clusters

Workstation clusters are thus a cheap and readily available approach to high performance computing Clusters are easier to integrate into existing networks Development tools for workstations are mature

Threads, PVM, MPI, DSM, C, C++, Java, etc.

Use of clusters as a distributed compute resource is cost effective --- incremental growth of system!!! Individual node performance can be improved by adding

additional resource (new memory blocks/disks) New nodes can be added or nodes can be removed Clusters of Clusters and Metacomputing

Page 9: 金仲達 國立清華大學資訊工程學系 king@cs.nthu.edu.tw Cluster Computing: An Introduction.

9

Key Benefits of Clusters

High performance: running cluster enabled programs

Scalability: adding servers to the cluster or by adding more clusters to the network as the need arises or CPU to SMP

High throughput System availability (HA): offer inherent high system

availability due to the redundancy of hardware, operating systems, and applications

Cost-effectively

Page 10: 金仲達 國立清華大學資訊工程學系 king@cs.nthu.edu.tw Cluster Computing: An Introduction.

10

Why Cluster Now?

Page 11: 金仲達 國立清華大學資訊工程學系 king@cs.nthu.edu.tw Cluster Computing: An Introduction.

11

Hardware and Software Trends

Important advances taken place in the last five year Network performance increased with reduced cost Workstation performance improved

Average number of transistors on a chip grows 40% per year Clock frequency growth rate is about 30% per year Expect 700-MHz processors with 100M transistors in early 2000

Availability of powerful and stable operating systems (Linux, FreeBSD) with source code access

Page 12: 金仲達 國立清華大學資訊工程學系 king@cs.nthu.edu.tw Cluster Computing: An Introduction.

12

Why Clusters NOW?

Clusters gained momentum when three technologies converged:Very high performance microprocessors

workstation performance = yesterday supercomputers

High speed communicationStandard tools for parallel/ distributed computing &

their growing popularity Time to market => performance Internet services: huge demands for scalable, available,

dedicated internet servers big I/O, big compute

Page 13: 金仲達 國立清華大學資訊工程學系 king@cs.nthu.edu.tw Cluster Computing: An Introduction.

13

Efficient Communication

The key enabling technology:from killer micro to killer switch Single chip building block for

scalable networks high bandwidth low latency very reliable

Challenges for clusters greater routing delay and less than

complete reliability constraints on where the network

connects into the node UNIX has a rigid device and

scheduling interface

Page 14: 金仲達 國立清華大學資訊工程學系 king@cs.nthu.edu.tw Cluster Computing: An Introduction.

14

Putting Them Together ...

Building block = complete computers(HW & SW) shipped in 100,000s:Killer micro, Killer DRAM, Killer disk,Killer OS, Killer packaging, Killer investment

Leverage billion $ per year investment Interconnecting building blocks => Killer Net

High bandwidth Low latency Reliable Commodity (ATM, Gigabit Ethernet,

MyridNet)

Page 15: 金仲達 國立清華大學資訊工程學系 king@cs.nthu.edu.tw Cluster Computing: An Introduction.

15

Windows of Opportunity

The resources available in the average clusters offer a number of research opportunities, such as Parallel processing: use multiple computers to build

MPP/DSM-like system for parallel computing Network RAM: use the memory associated with each

workstation as an aggregate DRAM cache Software RAID: use the arrays of workstation disks to

provide cheap, highly available, and scalable file storage Multipath communication: use the multiple networks

for parallel data transfer between nodes

Page 16: 金仲達 國立清華大學資訊工程學系 king@cs.nthu.edu.tw Cluster Computing: An Introduction.

16

Windows of Opportunity

Most high-end scalable WWW servers are clusters end services (data, web, enhanced information services,

reliability)

Network mediation services also cluster-based Inktomi traffic server, etc. Clustered proxy caches, clustered firewalls, etc. => These object web applications increasingly compute

intensive => These applications are an increasing part of the

“scientific computing”

Page 17: 金仲達 國立清華大學資訊工程學系 king@cs.nthu.edu.tw Cluster Computing: An Introduction.

17

Classification of

Cluster Computers

Page 18: 金仲達 國立清華大學資訊工程學系 king@cs.nthu.edu.tw Cluster Computing: An Introduction.

18

Clusters Classification 1

Based on Focus (in Market)High performance (HP) clusters

Grand challenging applicationsHigh availability (HA) clusters

Mission critical applications

Page 19: 金仲達 國立清華大學資訊工程學系 king@cs.nthu.edu.tw Cluster Computing: An Introduction.

19

HA Clusters

Page 20: 金仲達 國立清華大學資訊工程學系 king@cs.nthu.edu.tw Cluster Computing: An Introduction.

20

Clusters Classification 2

Based on Workstation/PC OwnershipDedicated clustersNon-dedicated clusters

Adaptive parallel computingCan be used for CPU cycle stealing

Page 21: 金仲達 國立清華大學資訊工程學系 king@cs.nthu.edu.tw Cluster Computing: An Introduction.

21

Clusters Classification 3

Based on Node ArchitectureClusters of PCs (CoPs)Clusters of Workstations (COWs)Clusters of SMPs (CLUMPs)

Page 22: 金仲達 國立清華大學資訊工程學系 king@cs.nthu.edu.tw Cluster Computing: An Introduction.

22

Clusters Classification 4

Based on Node Components Architecture & Configuration:Homogeneous clusters

All nodes have similar configuration

Heterogeneous clusters Nodes based on different processors and running

different OS

Page 23: 金仲達 國立清華大學資訊工程學系 king@cs.nthu.edu.tw Cluster Computing: An Introduction.

23

Clusters Classification 5

Based on Levels of Clustering:Group clusters (# nodes: 2-99)

A set of dedicated/non-dedicated computers --- mainly connected by SAN like Myrinet

Departmental clusters (# nodes: 99-999)Organizational clusters (# nodes: many 100s)Internet-wide clusters = Global clusters

(# nodes: 1000s to many millions) Metacomputing

Page 24: 金仲達 國立清華大學資訊工程學系 king@cs.nthu.edu.tw Cluster Computing: An Introduction.

24

Clusters and Their

Commodity Components

Page 25: 金仲達 國立清華大學資訊工程學系 king@cs.nthu.edu.tw Cluster Computing: An Introduction.

25

Cluster Computer Architecture

Page 26: 金仲達 國立清華大學資訊工程學系 king@cs.nthu.edu.tw Cluster Computing: An Introduction.

26

Cluster Components...1aNodes

Multiple high performance components:PCsWorkstationsSMPs (CLUMPS)Distributed HPC systems leading to

Metacomputing They can be based on different architectures and

running different OS

Page 27: 金仲達 國立清華大學資訊工程學系 king@cs.nthu.edu.tw Cluster Computing: An Introduction.

27

Cluster Components...1bProcessors There are many (CISC/RISC/VLIW/Vector..)

Intel: Pentiums, Xeon, Merced…. Sun: SPARC, ULTRASPARC HP PA IBM RS6000/PowerPC SGI MPIS Digital Alphas

Integrating memory, processing and networking into a single chip IRAM (CPU & Mem): (http://iram.cs.berkeley.edu) Alpha 21366 (CPU, Memory Controller, NI)

Page 28: 金仲達 國立清華大學資訊工程學系 king@cs.nthu.edu.tw Cluster Computing: An Introduction.

28

Cluster Components…2OS

State of the art OS: Tend to be modular: can easily be extended and new

subsystem can be added without modifying the underlying OS structure

Multithread has added a new dimension to parallel processing

Popular OS used on nodes of clusters: Linux (Beowulf) Microsoft NT (Illinois HPVM) SUN Solaris (Berkeley NOW) IBM AIX (IBM SP2) …..

Page 29: 金仲達 國立清華大學資訊工程學系 king@cs.nthu.edu.tw Cluster Computing: An Introduction.

29

Cluster Components…3High Performance Networks

Ethernet (10Mbps) Fast Ethernet (100Mbps) Gigabit Ethernet (1Gbps) SCI (Dolphin - MPI- 12 usec latency) ATM Myrinet (1.2Gbps) Digital Memory Channel FDDI

Page 30: 金仲達 國立清華大學資訊工程學系 king@cs.nthu.edu.tw Cluster Computing: An Introduction.

30P

Cluster Components…4Network Interfaces

Dedicated Processing power and storage embedded in the Network Interface

An I/O card today Tomorrow on chip?

$

M I/O bus (S-Bus)50 MB/s

MryicomNet

P

Sun Ultra 170

MyricomNIC

160 MB/s

M

Page 31: 金仲達 國立清華大學資訊工程學系 king@cs.nthu.edu.tw Cluster Computing: An Introduction.

31

Cluster Components…4Network Interfaces

Network interface cardMyrinet has NICUser-level access support: VIAAlpha 21364 processor integrates processing,

memory controller, network interface into a single chip..

Page 32: 金仲達 國立清華大學資訊工程學系 king@cs.nthu.edu.tw Cluster Computing: An Introduction.

32

Cluster Components…5 Communication Software

Traditional OS supported facilities (but heavy weight due to protocol processing).. Sockets (TCP/IP), Pipes, etc.

Light weight protocols (user-level): minimal Interface into OS User must transmit directly into and receive from the

network without OS intervention Communication protection domains established by

interface card and OS Treat message loss as an infrequent case Active Messages (Berkeley), Fast Messages (UI), ...

Page 33: 金仲達 國立清華大學資訊工程學系 king@cs.nthu.edu.tw Cluster Computing: An Introduction.

33

Cluster Components…6aCluster Middleware

Resides between OS and applications and offers an infrastructure for supporting:Single System Image (SSI)System Availability (SA)

SSI makes collection of computers appear as a single machine (globalized view of system resources)

SA supports check pointing and process migration, etc.

Page 34: 金仲達 國立清華大學資訊工程學系 king@cs.nthu.edu.tw Cluster Computing: An Introduction.

34

Cluster Components…6bMiddleware Components

Hardware DEC Memory Channel, DSM (Alewife, DASH) SMP

techniques

OS/gluing layers Solaris MC, Unixware, Glunix

Applications and Subsystems System management and electronic forms Runtime systems (software DSM, PFS etc.) Resource management and scheduling (RMS):

CODINE, LSF, PBS, NQS, etc.

Page 35: 金仲達 國立清華大學資訊工程學系 king@cs.nthu.edu.tw Cluster Computing: An Introduction.

35

Cluster Components…7aProgramming Environments

Threads (PCs, SMPs, NOW, ..) POSIX Threads Java Threads

MPI Linux, NT, on many Supercomputers

PVM Software DSMs (Shmem)

Page 36: 金仲達 國立清華大學資訊工程學系 king@cs.nthu.edu.tw Cluster Computing: An Introduction.

36

Cluster Components…7bDevelopment Tools?

Compilers C/C++/Java/

RAD (rapid application development tools):GUI based tools for parallel processing modeling

Debuggers Performance monitoring and analysis tools Visualization tools

Page 37: 金仲達 國立清華大學資訊工程學系 king@cs.nthu.edu.tw Cluster Computing: An Introduction.

37

Cluster Components…8Applications

Sequential Parallel/distributed (cluster-aware applications)

Grand challenging applications Weather Forecasting Quantum Chemistry Molecular Biology Modeling Engineering Analysis (CAD/CAM) ……………….

Web servers, data-mining

Page 38: 金仲達 國立清華大學資訊工程學系 king@cs.nthu.edu.tw Cluster Computing: An Introduction.

38

Cluster Middleware and

Single System Image

Page 39: 金仲達 國立清華大學資訊工程學系 king@cs.nthu.edu.tw Cluster Computing: An Introduction.

39

Middleware Design Goals

Complete transparency Let users see a single cluster system

Single entry point, ftp, telnet, software loading...

Scalable performance Easy growth of cluster

no change of API and automatic load distribution

Enhanced availability Automatic recovery from failures

Employ checkpointing and fault tolerant technologies

Handle consistency of data when replicated..

Page 40: 金仲達 國立清華大學資訊工程學系 king@cs.nthu.edu.tw Cluster Computing: An Introduction.

40

Single System Image (SSI)

A single system image is the illusion, created by software or hardware, that a collection of computers appear as a single computing resource

Benefits: Usage of system resources transparently Improved reliability and higher availability Simplified system management Reduction in the risk of operator errors User need not be aware of the underlying system

architecture to use these machines effectively

Page 41: 金仲達 國立清華大學資訊工程學系 king@cs.nthu.edu.tw Cluster Computing: An Introduction.

41

Desired SSI Services

Single entry point telnet cluster.my_institute.edu telnet node1.cluster.my_institute.edu

Single file hierarchy: AFS, Solaris MC Proxy Single control point: manage from single GUI Single virtual networking Single memory space - DSM Single job management: Glunix, Condin, LSF Single user interface: like workstation/PC

windowing environment

Page 42: 金仲達 國立清華大學資訊工程學系 king@cs.nthu.edu.tw Cluster Computing: An Introduction.

42

SSI Levels

Single system support can exist at different levels within a system, one is able to be built on another

Application and Subsystem Level

Operating System Kernel Level

Hardware Level

Page 43: 金仲達 國立清華大學資訊工程學系 king@cs.nthu.edu.tw Cluster Computing: An Introduction.

46

Availability Support Functions

Single I/O space (SIO): Any node can access any peripheral or disk devices

without the knowledge of physical location.

Single process space (SPS) Any process can create processes on any node, and

they can communicate through signals, pipes, etc, as if they were one a single node

Checkpointing and process migration Saves the process state and intermediate results in memory

or disk; process migration for load balancing

Reduction in the risk of operator errors

Page 44: 金仲達 國立清華大學資訊工程學系 king@cs.nthu.edu.tw Cluster Computing: An Introduction.

47

Relationship among Middleware Modules

Page 45: 金仲達 國立清華大學資訊工程學系 king@cs.nthu.edu.tw Cluster Computing: An Introduction.

48

Strategies for SSI

Build as a layer on top of existing OS (e.g. Glunix) Benefits:

Makes the system quickly portable, tracks vendor software upgrades, and reduces development time

New systems can be built quickly by mapping new services onto the functionality provided by the layer beneath, e.g. Glunix/Solaris-MC

Build SSI at the kernel level (True Cluster OS) Good, but can’t leverage of OS improvements by

vendor e.g. Unixware and Mosix (built using BSD Unix)

Page 46: 金仲達 國立清華大學資訊工程學系 king@cs.nthu.edu.tw Cluster Computing: An Introduction.

49

Representative Cluster Systems

Page 47: 金仲達 國立清華大學資訊工程學系 king@cs.nthu.edu.tw Cluster Computing: An Introduction.

50

Research Projects of Clusters

Beowulf: CalTech, JPL, and NASA Condor: Wisconsin State University DQS (Distributed Queuing System): Florida

State U. HPVM (High Performance Virtual Machine):

UIUC& UCSB Gardens: Queensland U. of Technology, AU NOW (Network of Workstations): UC Berkeley PRM (Prospero Resource Manager): USC

Page 48: 金仲達 國立清華大學資訊工程學系 king@cs.nthu.edu.tw Cluster Computing: An Introduction.

51

Commercial Cluster Software

Codine (Computing in Distributed Network Environment): GENIAS GmbH, Germany

LoadLeveler: IBM Corp. LSF (Load Sharing Facility): Platform Computing NQE (Network Queuing Environment): Craysoft RWPC: Real World Computing Partnership, Japan Unixware: SCO Solaris-MC: Sun Microsystems

Page 49: 金仲達 國立清華大學資訊工程學系 king@cs.nthu.edu.tw Cluster Computing: An Introduction.

55

Comparison of 4 Cluster Systems

Page 50: 金仲達 國立清華大學資訊工程學系 king@cs.nthu.edu.tw Cluster Computing: An Introduction.

56

Task Forces

on Cluster Computing

Page 51: 金仲達 國立清華大學資訊工程學系 king@cs.nthu.edu.tw Cluster Computing: An Introduction.

57

IEEE Task Force on Cluster Computing (TFCC)

http://www.dgs.monash.edu.au/~rajkumar/tfcc/

http://www.dcs.port.ac.uk/~mab/tfcc/

Page 52: 金仲達 國立清華大學資訊工程學系 king@cs.nthu.edu.tw Cluster Computing: An Introduction.

58

TFCC Activities

Mailing list, workshops, conferences, tutorials, web-resources etc.

Resources for introducing the subject in senior undergraduate and graduate levels

Tutorials/workshops at IEEE Chapters ….. and so on.

Visit TFCC Page for more details: http://www.dgs.monash.edu.au/~rajkumar/tfcc/

Page 53: 金仲達 國立清華大學資訊工程學系 king@cs.nthu.edu.tw Cluster Computing: An Introduction.

59

Efforts in Taiwan

PC Farm Project at Academia Sinica Computing Center: http://www.pcf.sinica.edu.tw/

NCHC PC Cluster Project: http://www.nchc.gov.tw/project/pccluster/

Page 54: 金仲達 國立清華大學資訊工程學系 king@cs.nthu.edu.tw Cluster Computing: An Introduction.

60

NCHC PC Cluster

A Beowulf class cluster

Page 55: 金仲達 國立清華大學資訊工程學系 king@cs.nthu.edu.tw Cluster Computing: An Introduction.

61

System Hardware

5 Fast Ethernet switching hubs

Page 56: 金仲達 國立清華大學資訊工程學系 king@cs.nthu.edu.tw Cluster Computing: An Introduction.

62

System Software

Page 57: 金仲達 國立清華大學資訊工程學系 king@cs.nthu.edu.tw Cluster Computing: An Introduction.

63

Conclusions

Clusters are promising and funOffer incremental growth and match with

funding patternNew trends in hardware and software

technologies are likely to make clusters more promising

Cluster-based HP and HA systems can be seen everywhere!

Page 58: 金仲達 國立清華大學資訊工程學系 king@cs.nthu.edu.tw Cluster Computing: An Introduction.

64

The Future

Cluster system using idle cycles from computers will continue

Individual nodes will have of multiple processors Widespread usage of Fast and Gigabit Ethernet and

they will become de facto network for clusters Cluster software bypass OS as much as possible Unix-based OS are likely to be most popular, but

the steady improvement and acceptance of NT will not be far behind

Page 59: 金仲達 國立清華大學資訊工程學系 king@cs.nthu.edu.tw Cluster Computing: An Introduction.

65

The Challenges

Programming enable applications, reduce programming effort, distributed

object/component models?

Reliability (RAS) programming effort, reliability with scalability to 1000’s

Heterogeneity performance, configuration, architecture and interconnect

Resource Management (scheduling, perf. pred.) System Administration/Management Input/Output (both network and storage)

Page 60: 金仲達 國立清華大學資訊工程學系 king@cs.nthu.edu.tw Cluster Computing: An Introduction.

66

Pointers to Literature on

Cluster Computing

Page 61: 金仲達 國立清華大學資訊工程學系 king@cs.nthu.edu.tw Cluster Computing: An Introduction.

67

Reading Resources..1aInternet & WWW

Computer architecture:http://www.cs.wisc.edu/~arch/www/

PFS and parallel I/O:http://www.cs.dartmouth.edu/pario/

Linux parallel processing:http://yara.ecn.purdue.edu/~pplinux/Sites/

Distributed shared memory:http://www.cs.umd.edu/~keleher/dsm.html

Page 62: 金仲達 國立清華大學資訊工程學系 king@cs.nthu.edu.tw Cluster Computing: An Introduction.

68

Reading Resources..1bInternet & WWW

Solaris-MC:http://www.sunlabs.com/research/solaris-mc

Microprocessors: recent advanceshttp://www.microprocessor.sscc.ru

Beowulf:http://www.beowulf.org

Metacomputinghttp://www.sis.port.ac.uk/~mab/Metacomputing/

Page 63: 金仲達 國立清華大學資訊工程學系 king@cs.nthu.edu.tw Cluster Computing: An Introduction.

69

Reading Resources..2Books

In Search of Clusterby G.Pfister, Prentice Hall (2ed), 98

High Performance Cluster ComputingVolume1: Architectures and SystemsVolume2: Programming and Applications

Edited by Rajkumar Buyya, Prentice Hall, NJ, USA.

Scalable Parallel Computingby K Hwang & Zhu, McGraw Hill,98

Page 64: 金仲達 國立清華大學資訊工程學系 king@cs.nthu.edu.tw Cluster Computing: An Introduction.

70

Reading Resources..3Journals

“A Case of NOW”, IEEE Micro, Feb1995 by Anderson, Culler, Paterson

“Fault Tolerant COW with SSI”, IEEE Concurrency by Kai Hwang, Chow, Wang, Jin, Xu

“Cluster Computing: The Commodity Supercomputing”, Journal of Software Practice and Experience by Mark Baker & Rajkumar Buyya