Transparent Cross-Border Migration of Parallel Multi Node Applications

52
Transparent Cross-Border Migration of Parallel Multi Node Applications Dominic Battré, Matthias Hovestadt, Odej Kao, Axel Keller , Kerstin Voss Cracow Grid Workshop 2007

description

Dominic Battr é , Matthias Hovestadt, Odej Kao, Axel Keller , Kerstin Voss Cracow Grid Workshop 2007. Transparent Cross-Border Migration of Parallel Multi Node Applications. Outline. Highly Predictable Clusters for Internet-Grids EC funded project in FP6. Advanced Risk Assessment & - PowerPoint PPT Presentation

Transcript of Transparent Cross-Border Migration of Parallel Multi Node Applications

Page 1: Transparent Cross-Border Migration of  Parallel Multi Node Applications

Transparent Cross-Border Migration of Parallel Multi Node Applications

Dominic Battré, Matthias Hovestadt, Odej Kao, Axel Keller, Kerstin Voss

Cracow Grid Workshop 2007

Page 2: Transparent Cross-Border Migration of  Parallel Multi Node Applications

Axel Keller 2

Outline

MotivationThe Software StackCross-Border MigrationSummary

Highly Predictable Clusters for Internet-Grids

EC funded project in FP6

Advanced Risk Assessment & Management for Trustable Grids

EC funded project in FP6

Page 3: Transparent Cross-Border Migration of  Parallel Multi Node Applications

Axel Keller 3

The Gap between Grid and RMS

SLA

RMS RMS RMS

M1 M2 M3

grid middleware

user request

Reliability? Quality of Service?

Best Effort!

User asks for SLA Grid Middleware realizes

job by means of local RMS

BUT: RMS offer Best Effort Need: SLA-aware RMSGuaranteed!

Page 4: Transparent Cross-Border Migration of  Parallel Multi Node Applications

Axel Keller 4

HPC4U: Highly Predictable Clusters for Internet-Grids

Objective Software-only solution for an SLA-aware, fault

tolerant infrastructure, offering reliability and QoS, and acting as active Grid component

Key Features System level checkpointing Job migration Job types: sequential and MPI-parallel Planning based scheduling

Page 5: Transparent Cross-Border Migration of  Parallel Multi Node Applications

Axel Keller 5

HPC4U: Planning Based Scheduling

queues

new

jobs

new

jobs

time

queuing systems

planning systems

planned time frame

present present and future

new job requests

insert in queues re-planning

assignment of planned start time

no all requests

runtime estimation

not necessary mandatory

backfilling optional yes, implicit

advance reservations

not possible yes, trivial

Machine

Page 6: Transparent Cross-Border Migration of  Parallel Multi Node Applications

Axel Keller 6

HPC4U: Software Stack

Process Network Storage

RMS

Negotiation

User- / Broker- Interface

Cluster

Scheduler

CLI

SSC

Page 7: Transparent Cross-Border Migration of  Parallel Multi Node Applications

Axel Keller 7

HPC4U: Checkpointing Cycle

RMS

Network StorageProcess

1. CP job+halt

2. In-TransitPackets

4. Snap-shot !

5. Link to Snapshot

6. Resume job

7. Job runningagain

3. Return: “Checkpoint

completed!”

Page 8: Transparent Cross-Border Migration of  Parallel Multi Node Applications

Axel Keller 8

Cross Border Migration: Intra Domain

Process Network Storage

RMS

User- / Broker- Interface

Cluster

CLI CRM

Negotiation

Scheduler

SSC

PP

Process Network Storage

RMS

User- / Broker- Interface

Cluster

CLI CRM

Negotiation

Scheduler

SSC

PP

Page 9: Transparent Cross-Border Migration of  Parallel Multi Node Applications

Axel Keller 9

Cross Border Migration: Target Retrieval

Process Network Storage

RMS

User- / Broker- Interface

Cluster

CLI CRM

Negotiation

Scheduler

SSC

PP

Process Network Storage

RMS

User- / Broker- Interface

Cluster

CLI CRM

Negotiation

Scheduler

SSC

PP

Page 10: Transparent Cross-Border Migration of  Parallel Multi Node Applications

Axel Keller 10

Cross Border Migration: Checkpoint Migration

Process Network Storage

RMS

User- / Broker- Interface

Cluster

CLI CRM

Negotiation

Scheduler

SSC

PP

Process Network Storage

RMS

User- / Broker- Interface

Cluster

CLI CRM

Negotiation

Scheduler

SSC

PP

Page 11: Transparent Cross-Border Migration of  Parallel Multi Node Applications

Axel Keller 11

Cross Border Migration: Remote Execution

Process Network Storage

RMS

User- / Broker- Interface

Cluster

CLI CRM

Negotiation

Scheduler

SSC

PP

Process Network Storage

RMS

User- / Broker- Interface

Cluster

CLI CRM

Negotiation

Scheduler

SSC

PP

Page 12: Transparent Cross-Border Migration of  Parallel Multi Node Applications

Axel Keller 12

Cross Border Migration: Result Migration

Process Network Storage

RMS

User- / Broker- Interface

Cluster

CLI CRM

Negotiation

Scheduler

SSC

PP

Process Network Storage

RMS

User- / Broker- Interface

Cluster

CLI CRM

Negotiation

Scheduler

SSC

PP

Page 13: Transparent Cross-Border Migration of  Parallel Multi Node Applications

Axel Keller 13

Cross-Border Migration: Using Globus

Process Network Storage

RMS

User- / Broker- Interface

Cluster

WS-AGCLI CRM

Negotiation

Scheduler

PP

WS-AG implementation based on GT4 Developed in EU project AssessGrid Source specifies SLA / file staging parameters

Subset of JSDL (POSIX Jobs)

Resource determination via broker Source directly contacts destination Destination pulls migration data via Grid-FTP Destination pushes result data back to source Source uses WSRF event notification

SSC

Broker

Page 14: Transparent Cross-Border Migration of  Parallel Multi Node Applications

Axel Keller 14

Ongoing Work: Introducing Risk Management

Process Network Storage

RMS

User- / Broker- Interface

Cluster

Consultant Service

Monitoring

CLI CRM

Negotiation

Scheduler

SSC

RiskAssessor

PP Topic of EU project: AssessGrid Encorporated in SLA Provider

Estimates risk for agreeing an SLA Considers propability of failure in

schedule Assessment based on historical data

BrokerWS-AG

Page 15: Transparent Cross-Border Migration of  Parallel Multi Node Applications

Axel Keller 15

Summary: Best Effort is not Enough

Cross border migration and Risk assessment provide new

means to increase the reliability of Grid Computing.

Page 16: Transparent Cross-Border Migration of  Parallel Multi Node Applications

Axel Keller 16

Read the paper AssessGrid www.assessgrid.eu HPC4U www.hpc4u.eu OpenCCS www.openccs.eu

More information

Thanks for your attention!

Page 17: Transparent Cross-Border Migration of  Parallel Multi Node Applications

Axel Keller 17

Contents

BACKUP

Page 18: Transparent Cross-Border Migration of  Parallel Multi Node Applications

Axel Keller 18

Scheduling Aspects

Execution Time Exact start time Earliest start time, latest finish time

User provides stage-in files by time X Provider keeps stage-out files until time Y Provisional Reservations Job Priorities Job Suspension

Page 19: Transparent Cross-Border Migration of  Parallel Multi Node Applications

Axel Keller 20

HPC4U

Page 20: Transparent Cross-Border Migration of  Parallel Multi Node Applications

Axel Keller 21

Motivation: Fault Tolerance

Commercial Grid users need SLAs Providers cautious on adoption

Reason: Business case risk

Missed deadlines due to system failures

Penalties to be paid Solution: Prevention with Fault Tolerance

Fault tolerance mechanisms available, but1. Application modification mandatory

2. Overall solution (System software, process, storage, file system, network) required

3. Combination with Grid migration missing

Page 21: Transparent Cross-Border Migration of  Parallel Multi Node Applications

Axel Keller 22

HPC4U Objective

Software-only solution for a SLA-aware, fault tolerant infrastructure, offering reliability and QoS, acting as active Grid component

Key features Definition and implementation of SLAs Resource reservation for guaranteed QoS Application-transparent fault tolerance

Page 22: Transparent Cross-Border Migration of  Parallel Multi Node Applications

Axel Keller 23

HPC4U: Concept

1. SLA negotiation as an explicit statement of expectations and obligations in a business relationship between provider and customer

2. Reservation of CPU, storage and network for desired time interval

3. Job start in checkpointing environment

4. In case of system failure

Job migration / restart with respect to SLA

Page 23: Transparent Cross-Border Migration of  Parallel Multi Node Applications

Axel Keller 25

Phases of Operation

StageIn

Compu-tation

StageOut

timePre-Runtime

Runtime Post-Runtime

Negotiation

Lifetime of SLA

Allocationof systemresources

Acceptance(or rejection)

of SLA

Negotiation of SLA Pre-Runtime: Configuration of Resources

e.g. network, storage, compute nodes Runtime: Stage-In, Computation, Stage-Out Post-Runtime: Re-configuration

Page 24: Transparent Cross-Border Migration of  Parallel Multi Node Applications

Axel Keller 26

Phase:Pre-Runtime

• Task of Pre-Runtime Phase Configuration of all allocated resources Goal: Fulfill requirements of SLA

• Reconfiguration affects all HPC4U elements Resource Management System

– e.g. configuration of assigned compute nodes Storage Subsystem

– e.g. initialization of a new data partition Network Subsystem

– e.g. configuration of network infrastructure

Page 25: Transparent Cross-Border Migration of  Parallel Multi Node Applications

Axel Keller 27

Phase: Runtime

Runtime Phase = lifetime of job in system adherence with SLA has to be assured FT mechanisms have to be utilized

Phase consists of three distinct steps Stage-In

transmission of required input data from Grid customer to compute resource

Computation execution of application

Stage-Out transmission of generated output data from

compute resource back to Grid customer

Page 26: Transparent Cross-Border Migration of  Parallel Multi Node Applications

Axel Keller 28

Phase: Post-Runtime

Task of Post-Runtime Phase: Re-Configuration of all resources

e.g. re-configuration of network e.g. deletion of checkpoint datasets e.g. deletion of temporary data

Counterpart to Pre-Runtime Phase

Allocation of resources ends Update of schedules in RMS and storage Resources are available for new jobs

Page 27: Transparent Cross-Border Migration of  Parallel Multi Node Applications

Axel Keller 30

PROCESS

Page 28: Transparent Cross-Border Migration of  Parallel Multi Node Applications

Axel Keller 31

Subsystems

Process Subsystem checkpointing of network cooperative checkpointing protocol (CCP)

Network Subsystem checkpoint network state

Storage Subsystem provision of storage provision of snapshot

Page 29: Transparent Cross-Border Migration of  Parallel Multi Node Applications

Axel Keller 33

STORAGE

Page 30: Transparent Cross-Border Migration of  Parallel Multi Node Applications

Axel Keller 34

Storage Resource 1Storage Resource 1 Storage Resource 2Storage Resource 2

Storage subsystem

Functionalities Negotiates the storage part of the SLA Provides storage capacity at a given QoS level Provides FT mechanisms

Requirement: manage multiple jobs running on the same SR

Computing

Storage

Physical space

Virtual Storage ManagerVirtual Storage Manager

Interface VSM - SR

Logical space(data layout strategies)

Page 31: Transparent Cross-Border Migration of  Parallel Multi Node Applications

Axel Keller 35

Data Container concept

Idea: create storage environment for applications at a desired

QoS level with abstraction of physical devices

Components:

LogicalVolume

File System

Job

File I/O (read, write, open,…)

Block I/O (read, write, ioctl)

Storage ResourceStorage Resource

DataContainer

Physical devices

BlockAddressMapping

data layout policies(e.g., simple striping)

Block I/O

Logicalspace

Page 32: Transparent Cross-Border Migration of  Parallel Multi Node Applications

Axel Keller 36

Data container properties

Storage part of the SLA

Data container section Size File system type Number of nodes that need to access the data container

(private/shared)

Performance section Application I/O profile Benchmark Bandwidth (in MB/s or IO/s) Or Default configuration

Dependability section Data redundancy type (within a cluster) Snapshot needed or not Data replication or not (between clusters)

Job specific section Job’s time to schedule and time to finish

Page 33: Transparent Cross-Border Migration of  Parallel Multi Node Applications

Axel Keller 37

Fault Tolerance Mechanisms

RAID Tolerate the failure of one or more disks

RAIN Tolerate the failure of one or more nodes

Implementation Hardware Software

Storage FT mechanisms rely on special data layouts

Software

Storage Snapshot

Page 34: Transparent Cross-Border Migration of  Parallel Multi Node Applications

Axel Keller 38

Data container snapshot

Provide instantaneous copy of data containers

Technique used: Copy-On-Write (COW) create multiple copies of data without duplicating

all the data blocks

With checkpoint, it allows application restart from a previous running stage

Impact on SR performance Taken into account at negotiation time

Page 35: Transparent Cross-Border Migration of  Parallel Multi Node Applications

Axel Keller 39

Redundant data layout

Job

Storage ResourceStorage Resource

Snapshot single node job restart after node failure

Characteristics:

• The job is running on a single node

• The data container is private to that node

• Data container snapshot resides on the same storage resource

Restore job’s state from previous checkpoint

4

Node failure1

1

Restore job’s data from previous snapshot

2

2

3 Start data container

3

Job

4

Job restart5

5

Page 36: Transparent Cross-Border Migration of  Parallel Multi Node Applications

Axel Keller 40

Interfaces with other components

RMSRMSRMSRMS

VSMVSMVSMVSM

datacontainer

datacontainer

datacontainer

Storage Resource (SR)Storage Resource (SR)Storage Resource (SR)Storage Resource (SR)

Interface VSM - RMS

Interface VSM – SR

Storage Storage SubsystemSubsystemStorage Storage SubsystemSubsystem

Network (socket , RDMA, …)Network (socket , RDMA, …)Network (socket , RDMA, …)Network (socket , RDMA, …)

VSMVSMVSMVSM

ExanodesExanodesExanodesExanodes

wrapper

ClassicalClassicalStorage ArrayStorage Array

wrapper

ClassicalClassicalStorage ArrayStorage Array

ClassicalClassicalStorage ArrayStorage Array

SR_type1 SR_type2

Open Open SourceSourceOpen Open SourceSource

ProprietaryProprietaryProprietaryProprietary

• client-server• callbacks

Page 37: Transparent Cross-Border Migration of  Parallel Multi Node Applications

Axel Keller 41

ASSESSGRID

Page 38: Transparent Cross-Border Migration of  Parallel Multi Node Applications

Axel Keller 43

Grid Fabric Layer with Risk Assessor

• NegotiationManager

- Agr./Agr.Fact. WS

- checks whether offer complies to template

- initiation of file transfers

• Scheduler

- creates tentative schedules for offers

• Risk Assessor

• Consultant Service

- records data

• Monitoring

- runtime behavior

Page 39: Transparent Cross-Border Migration of  Parallel Multi Node Applications

Axel Keller 46

Precautionary Fault-Tolerance

How many

spare

resources are

available at

execution time?

•Use of planning based scheduler

Page 40: Transparent Cross-Border Migration of  Parallel Multi Node Applications

Axel Keller 47

Estimating Risk for a Job Execution

Use of planning based scheduler How much slack time is available for

fault tolerance? How much effort do I undertake for

fault tolerance?

What is the considered risk of resource failure?

Earliest Start Time Latest Finish Time

Execution Time Slack Time

Page 41: Transparent Cross-Border Migration of  Parallel Multi Node Applications

Axel Keller 48

Risk Assessment

low riskmiddle riskhigh risk

Estimate risk for agreeing an SLA consider risk of resource failure estimate risk for a job execution initiate precautionary FT mechanisms

Page 42: Transparent Cross-Border Migration of  Parallel Multi Node Applications

Axel Keller 49

Risk Management at Job Execution

Risk Management

Decisions

Actions

Risk Assessment

Business Model (price, penalty)

Weekend/Holiday/Workday

Schedule (SLAs, best effort)

Redundancy Measures

Events

Page 43: Transparent Cross-Border Migration of  Parallel Multi Node Applications

Axel Keller 50

Detection of Bottlenecks

Consultant Service Analysis of SLA violation

Estimated risk for the job Planned FT mechanisms

Monitoring Information Job Resources

Data Mining Find connections between SLA violations Detect weak points in the provider’s infrastructure

Page 44: Transparent Cross-Border Migration of  Parallel Multi Node Applications

Axel Keller 51

WS-AG

Page 45: Transparent Cross-Border Migration of  Parallel Multi Node Applications

Axel Keller 52

Components

Page 46: Transparent Cross-Border Migration of  Parallel Multi Node Applications

Axel Keller 53

Implementation with Globus Toolkit 4

Why Globus? Utility: Authentication, Authorization, Delegation, RFT, MDS, WS-

Notification Impact

Problem 1: GRAM (Grid Resource Allocation and Management) State machine, incl. File-Staging, Delegation of Credentials, RSL Cannot use it: written for batch schedulers, nor for planning

schedulers Problem 2: Deviations from WS-AG spec.

Different Namespaces WS-A, WS-RF

Page 47: Transparent Cross-Border Migration of  Parallel Multi Node Applications

Axel Keller 54

Implementation with Globus Toolkit 4

Technical Challenges xs:anyType

Wrote custom serializers/deserializers Subtitution groups

Used in ItemConstraint (Creation Constraints) Cannot be mapped to Java by Axis Replaced by xs:anyType – use as DOM tree

CreationConstraints Namespace prefixes in XPaths meaningless Need for WSDL and interpretation for xs:all, xs:choice, and friends

Page 48: Transparent Cross-Border Migration of  Parallel Multi Node Applications

Axel Keller 55

Context

<wsag:Context> … <wsag:AgreementInitiator> <AG:DistinguishedName> /C=DE/O=… </AG:DistinguishedName> </wsag:AgreementInitiator> <wsag:AgreementResponder>EPR</…> <AG:ServiceUsers> <AG:ServiceUser>DN</…> </AG:ServiceUsers> …</wsag:Context>

Context

Terms

Creation Constraints

Page 49: Transparent Cross-Border Migration of  Parallel Multi Node Applications

Axel Keller 56

Terms, SDTs

Conjunction of terms Common structure of templates WS-AG too powerful/difficult to fully support

Service Description Term (one) assessgrid:ServiceDescription (extension of abstract

ServiceTermType) jsdl:POSIXExecutable (executable, arguments,

environment) jsdl:Application (mis-)used for libraries jsdl:Resources jsdl:DataStaging * assessgrid:PoF (upper bound)

Context

Terms

Creation Constraints

Page 50: Transparent Cross-Border Migration of  Parallel Multi Node Applications

Axel Keller 57

Terms, GuaranteeTerms

No hierarchy but two meta guarantees ProviderFulfillsAllObligations

e.g. Reward: 1000 EUR, Penalty 1000 EUR

ConsumerFulfillsAllObligations e.g. Reward: 0 EUR, Penalty 1000 EUR

First violation is responsible for failure No hardware problem, then User fault Other Guarantees

Execution Time Any start time (best effort) Exact start time Earliest start time, latest finish time

User provides StageIn files by time X Provider keeps StageOut files until time Y

Context

Terms

Creation Constraints

No timely execution

No stage-out

Page 51: Transparent Cross-Border Migration of  Parallel Multi Node Applications

Axel Keller 58

Terms

• SLA does not contain requirements of fault tolerance mechanisms• Covered by asserted PoF, penalty and loss of reputation

• Compulsory Assessment Intervals not really useful for us• How often do you assess that job was allocated for asserted

time?• Preferences too complicated

Context

Terms

Creation Constraints

Page 52: Transparent Cross-Border Migration of  Parallel Multi Node Applications

Axel Keller 59

CreationConstraints

• Difficult to support Namespaces:• //wsag:…/assessgrid:… - prefixes are just strings

• Very difficult to support structural information• xs:group, xs:all, xs:choice, xs:sequence

• Possible but difficult to support xs:restriction xs:simple

Check for enumeration (xs:restriction of xs:string) Check for valid dates (xs:restriction of xs:date) Everything else close to impossible

{min,max}{In,Ex}clusive totalDigits, fractionDigits, length,

… probably useless

Context

Terms

Creation Constraints