Single System Image

27
1 Single System Image Infrastructure and Tools

description

Single System Image. Infrastructure and Tools. PC/Workstation. PC/Workstation. PC/Workstation. PC/Workstation. Communications Software. Communications Software. Communications Software. Communications Software. Network Interface Hardware. Network Interface Hardware. - PowerPoint PPT Presentation

Transcript of Single System Image

Page 1: Single System Image

1

Single System Image

Infrastructure and Tools

Page 2: Single System Image

2

Cluster Computer Architecture

Sequential Applications

Parallel Applications

Parallel Programming Environment

Cluster Middleware(Single System Image and Availability Infrastructure)

Cluster Interconnection Network/Switch

PC/Workstation

Network Interface Hardware

CommunicationsSoftware

PC/Workstation

Network Interface Hardware

CommunicationsSoftware

PC/Workstation

Network Interface Hardware

CommunicationsSoftware

PC/Workstation

Network Interface Hardware

CommunicationsSoftware

Sequential Applications

Sequential Applications

Parallel ApplicationsParallel

Applications

Page 3: Single System Image

3

• Enhanced Performance (performance @ low cost)• Enhanced Availability (failure management)• Single System Image (look-and-feel of one

system)• Size Scalability (physical & application)• Fast Communication (networks & protocols)• Load Balancing (CPU, Net, Memory, Disk) • Security and Encryption (clusters of clusters)• Distributed Environment (Social issues)• Manageability (admin. And control)• Programmability (simple API if required)• Applicability (cluster-aware and non-aware app.)

A major issues in Cluster design

Page 4: Single System Image

4

A typical Cluster Computing Environment

PVM / MPI/ RSH

Applications

Hardware/OS

???

Page 5: Single System Image

5

The missing link is provide by cluster middleware/underware

PVM / MPI/ RSH

Applications

Hardware/OS

Middleware or Underware

PVM / MPI/ RSH

Page 6: Single System Image

6

Middleware Design Goals Complete Transparency (Manageability):

Lets the see a single cluster system.. Single entry point, ftp, telnet, software loading...

Scalable Performance: Easy growth of cluster

no change of API & automatic load distribution. Enhanced Availability:

Automatic Recovery from failures Employ checkpointing & fault tolerant technologies

Handle consistency of data when replicated..

Page 7: Single System Image

7

What is Single System Image (SSI)?

SSI is the illusion, created by software or hardware, that presents a collection of computing resources as one, more whole resource.

SSI makes the cluster appear like a single machine to the user, to applications, and to the network.

Page 8: Single System Image

8

Benefits of SSI Use of system resources transparent. Transparent process migration and load

balancing across nodes. Improved reliability and higher availability. Improved system response time and

performance Simplified system management. Reduction in the risk of operator errors. No need to be aware of the underlying system

architecture to use these machines effectively.

Page 9: Single System Image

9

Desired SSI Services Single Entry Point:

telnet cluster.my_institute.edu telnet node1.cluster. institute.edu

Single File Hierarchy: /Proc, NFS, xFS, AFS, etc.

Single Control Point: Management GUI Single virtual networking Single memory space - Network RAM/DSM Single Job Management: Glunix, Codine, LSF Single GUI: Like workstation/PC windowing

environment – it may be Web technology

Page 10: Single System Image

10

Availability Support Functions Single I/O space:

Any node can access any peripheral or disk devices without the knowledge of physical location.

Single process Space: Any process on any node create process with

cluster wide process wide and they communicate through signal, pipes, etc, as if they are one a single node.

Checkpointing and process migration: Can saves the process state and intermediate

results in memory to disk to support rollback recovery when node fails. RMS Load balancing...

Page 11: Single System Image

11

SSI Levels SSI levels of abstractions:

Application and Subsystem Level

Operating System Kernel Level

Hardware Level

Page 12: Single System Image

12

SSI at Application and Sub-system Levels

Level Examples Boundary Importance

Application batch system andsystem management

Sub-system

File system

Distributed DB,OSF DME, Lotus Notes, MPI, PVM

An application What a userwants

Sun NFS, OSF,DFS, NetWare,and so on

A sub-system SSI for allapplications ofthe sub-system

Implicitly supports many applications and subsystems

Shared portion of the file system

Toolkit OSF DCE, SunONC+, ApolloDomain

Best level of support for heterogeneous system

Explicit toolkitfacilities: user,service name, time

(c) In search of clusters

Page 13: Single System Image

13

SSI at OS Kernel LevelLevel Examples Boundary Importance

Kernel/OS Layer

Solaris MC, Unixware MOSIX, Sprite, Amoeba/GLunix

Kernelinterfaces

Virtualmemory

UNIX (Sun) vnode,Locus (IBM) vproc

Each name space:files, processes, pipes, devices, etc.

Kernel support forapplications, admsubsystems

None supportingOS kernel

Type of kernelobjects: files,processes, etc.

Modularises SSIcode within kernel

May simplifyimplementationof kernel objects

Each distributedvirtual memoryspace

Microkernel Mach, PARAS, Chorus,OSF/1AD, Amoeba

Implicit SSI forall system services

Each serviceoutside themicrokernel

(c) In search of clusters

Page 14: Single System Image

14

SSI at Hardware Level

memory and I/O

Level Examples Boundary Importance

memory SCI, DASH better communication and synchronization

memory space

SCI, SMP techniques lower overheadcluster I/O

memory and I/Odevice space

Application and Subsystem Level

Operating System Kernel Level

(c) In search of clusters

Page 15: Single System Image

15

SSI Characteristics

Every SSI has a boundary. Single system support can

exist at different levels within a system, one able to be build on another.

Page 16: Single System Image

16

SSI Boundaries

Batch System

SSIBoundary (c) In search

of clusters

Page 17: Single System Image

17

Relationship Among Middleware Modules

Page 18: Single System Image

18

SSI via OS path! 1. Build as a layer on top of the existing OS

Benefits: makes the system quickly portable, tracks vendor software upgrades, and reduces development time.

i.e. new systems can be built quickly by mapping new services onto the functionality provided by the layer beneath. e.g.: Glunix.

2. Build SSI at kernel level, True Cluster OS Good, but Can’t leverage of OS improvements by

vendor. E.g. Unixware, Solaris-MC, and MOSIX.

Page 19: Single System Image

19

SSI Systems & Tools OS level SSI:

SCO NSC UnixWare; Solaris-MC; MOSIX, ….

Middleware level SSI: PVM, TreadMarks (DSM), Glunix,

Condor, Codine, Nimrod, …. Application level SSI:

PARMON, Parallel Oracle, ...

Page 20: Single System Image

SCO Non-stop Cluster for UnixWare

Users, applications, and systems management

Standard OS kernel calls

Modularkernel

extensions

Extensions

UP or SMP node

Users, applications, and systems management

Standard OS kernel calls

Modular kernel

extensions

Extensions

Devices Devices

ServerNet

UP or SMP node

Standard SCO UnixWare

with clustering hooks

Standard SCO UnixWare

with clustering hooks

Other nodes

http://www.sco.com/products/clustering/

Page 21: Single System Image

How does NonStop Clusters Work?

Modular Extensions and Hooks to Provide: Single Clusterwide Filesystem view; Transparent Clusterwide device access; Transparent swap space sharing; Transparent Clusterwide IPC; High Performance Internode Communications; Transparent Clusterwide Processes, migration,etc.; Node down cleanup and resource failover; Transparent Clusterwide parallel TCP/IP networking; Application Availability; Clusterwide Membership and Cluster timesync; Cluster System Administration; Load Leveling.

Page 22: Single System Image

22

Sun Solaris MC Solaris MC: A High Performance Operating System for

Clusters A distributed OS for a multicomputer, a cluster of computing

nodes connected by a high-speed interconnect Provide a single system image, making the cluster appear

like a single machine to the user, to applications, and the the network

Built as a globalization layer on top of the existing Solaris kernel

Interesting features extends existing Solaris OS preserves the existing Solaris ABI/API compliance provides support for high availability uses C++, IDL, CORBA in the kernel leverages spring technology

Page 23: Single System Image

23

Solaris-MC: Solaris for MultiComputers

global file system

globalized process management

globalized networking and I/O

Solaris MC Architecture

System call interface

Network

File system

C++

Processes

Object framework

Existing Solaris 2.5 kernel

Othernodes

Object invocations

Kernel

Solaris MC

Applications

http://www.sun.com/research/solaris-mc/

Page 24: Single System Image

24

Solaris MC components Object and

communication support

High availability support

PXFS global distributed file system

Process management

NetworkingSolaris MC Architecture

System call interface

Network

File system

C++

Processes

Object framework

Existing Solaris 2.5 kernel

Othernodes

Object invocations

Kernel

Solaris MC

Applications

Page 25: Single System Image

25

MOSIX: Multicomputer OS for UNIX

An OS module (layer) that provides the applications with the illusion of working on a single system.

Remote operations are performed like local operations.

Transparent to the application - user interface unchanged.

PVM / MPI / RSHMOSIX

Application

Hardware/OS

http://www.mosix.cs.huji.ac.il/ || mosix.org

Page 26: Single System Image

26

Main tool

Supervised by distributed algorithms that respond on-line to global resource availability – transparently.

Load-balancing - migrate process from over-loaded to under-loaded nodes.

Memory ushering - migrate processes from a node that has exhausted its memory, to prevent paging/swapping.

Preemptive process migration that can migrate any process, anywhere, anytime

Page 27: Single System Image

27

MOSIX for Linux at HUJI A scalable cluster configuration:

50 Pentium-II 300 MHz 38 Pentium-Pro 200 MHz (some are SMPs) 16 Pentium-II 400 MHz (some are SMPs)

Over 12 GB cluster-wide RAM Connected by the Myrinet 2.56 G.b/s LAN

Runs Red-Hat 6.0, based on Kernel 2.2.7 Upgrade: HW with Intel, SW with Linux Download MOSIX:

http://www.mosix.cs.huji.ac.il/