Advanced Parallel Primitives in SPM.Python for Inheriting ......Parallelism: The management of a...

65
Advanced Parallel Primitives in SPM.Python for Inheriting Fault-Tolerance, and Scalable Processing of Data and Graphs Minesh B. Amin mamin @ mbasciences.com http://www.mbasciences.com HPC Advisory Council / Stanford Workshop 2011 Stanford University, CA Dec 7, 2011

Transcript of Advanced Parallel Primitives in SPM.Python for Inheriting ......Parallelism: The management of a...

Page 1: Advanced Parallel Primitives in SPM.Python for Inheriting ......Parallelism: The management of a collection of serial tasks Management: The policies by which: tasks are scheduled,

Advanced Parallel Primitives in SPM.Pythonfor Inheriting Fault-Tolerance, and Scalable

Processing of Data and Graphs

Minesh B. Aminmamin @ mbasciences.com

http://www.mbasciences.com

HPC Advisory Council / Stanford Workshop 2011

Stanford University, CA

Dec 7, 2011

© 2011 MBA Sciences, Inc. All rights reserved.

Page 2: Advanced Parallel Primitives in SPM.Python for Inheriting ......Parallelism: The management of a collection of serial tasks Management: The policies by which: tasks are scheduled,

Problem Statement

... exploiting parallelism using libraries

Page 3: Advanced Parallel Primitives in SPM.Python for Inheriting ......Parallelism: The management of a collection of serial tasks Management: The policies by which: tasks are scheduled,

Problem Statement

... exploiting parallelism using frameworks

libraries

Page 4: Advanced Parallel Primitives in SPM.Python for Inheriting ......Parallelism: The management of a collection of serial tasks Management: The policies by which: tasks are scheduled,

Problem Statement

... exploiting parallelism using parallel primitives

frameworks

libraries

Page 5: Advanced Parallel Primitives in SPM.Python for Inheriting ......Parallelism: The management of a collection of serial tasks Management: The policies by which: tasks are scheduled,

Problem Statement

... exploiting parallelism using parallel primitives

frameworks

libraries

Clone

CloneRepeat

PartitionAggregate

Decentralized

PartitionAggregate

Centralized

PartitionList

PartitionDAG

{● Single, self-contained parallel environment● Patented Technology ...

Enable any OpenMPI application to inherit support for:

● Fault tolerance● Timeout● Detection of deadlocks

Partition/OpenMPI

Suites of parallel primitives to process data and graphsin parallel

Partition/HybridFlow

Page 6: Advanced Parallel Primitives in SPM.Python for Inheriting ......Parallelism: The management of a collection of serial tasks Management: The policies by which: tasks are scheduled,

Problem Statement

... exploiting parallelism using parallel primitives

frameworks

libraries

Clone

CloneRepeat

PartitionAggregate

Decentralized

PartitionAggregate

Centralized

PartitionList

PartitionDAG

{● Single, self-contained parallel environment● Patented Technology ...

Enable any OpenMPI application to inherit support for:

● Fault tolerance● Timeout● Detection of deadlocks

Partition/OpenMPI

Suites of parallel primitives to process data and graphsin parallel

Partition/HybridFlow

Page 7: Advanced Parallel Primitives in SPM.Python for Inheriting ......Parallelism: The management of a collection of serial tasks Management: The policies by which: tasks are scheduled,

Problem Statement

... exploiting parallelism using parallel primitives

frameworks

libraries

Clone

CloneRepeat

PartitionAggregate

Decentralized

PartitionAggregate

Centralized

PartitionList

PartitionDAG

{● Single, self-contained parallel environment● Patented Technology ...

Enable any OpenMPI application to inherit support for:

● Fault tolerance● Timeout● Detection of deadlocks

Partition/OpenMPI

Suites of parallel primitives to process data and graphsin parallel

Partition/HybridFlow

Page 8: Advanced Parallel Primitives in SPM.Python for Inheriting ......Parallelism: The management of a collection of serial tasks Management: The policies by which: tasks are scheduled,

Terminology: ”Exploiting Parallelism”

Parallelism: The management of a collection of serial tasks

Management: The policies by which:● tasks are scheduled,

● premature terminations are handled,

● preemptive support is provided,

● communication primitives are enabled/disabled, and

● the manner in which resources are obtained andreleased

Serial Tasks: Are classified in terms of either:● Coarse grain ... where tasks may not communicate

prior to conclusion, or

● Fine grain ... where tasks may communicate priorto conclusion.

Management policies codify how serial tasks areto be managed ... independent of what they may be

Page 9: Advanced Parallel Primitives in SPM.Python for Inheriting ......Parallelism: The management of a collection of serial tasks Management: The policies by which: tasks are scheduled,

Terminology: ”Exploiting Parallelism”

Parallelism: The management of a collection of serial tasks

Management: The policies by which:● tasks are scheduled,

● premature terminations are handled,

● preemptive support is provided,

● communication primitives are enabled/disabled, and

● the manner in which resources are obtained andreleased

Serial Tasks: Are classified in terms of either:● Coarse grain ... where tasks may not communicate

prior to conclusion, or

● Fine grain ... where tasks may communicate priorto conclusion.

Management policies codify how serial tasks areto be managed ... independent of what they may be

Page 10: Advanced Parallel Primitives in SPM.Python for Inheriting ......Parallelism: The management of a collection of serial tasks Management: The policies by which: tasks are scheduled,

Terminology: ”The Big Picture”

Question: Is exploiting parallelism {easyhard

} ?

Page 11: Advanced Parallel Primitives in SPM.Python for Inheriting ......Parallelism: The management of a collection of serial tasks Management: The policies by which: tasks are scheduled,

Terminology: ”The Big Picture”

Question: Is exploiting parallelism {easyhard

} ?

Page 12: Advanced Parallel Primitives in SPM.Python for Inheriting ......Parallelism: The management of a collection of serial tasks Management: The policies by which: tasks are scheduled,

Terminology: ”The Big Picture”

Question: Is exploiting parallelism {easyhard

} ?What makes

Page 13: Advanced Parallel Primitives in SPM.Python for Inheriting ......Parallelism: The management of a collection of serial tasks Management: The policies by which: tasks are scheduled,

Terminology: ”The Big Picture”

Question: Is exploiting parallelism {easyhard

} ?What makes

Supposition: The gap between developer’s intent and API of PET(parallel enabling technology) ...

Page 14: Advanced Parallel Primitives in SPM.Python for Inheriting ......Parallelism: The management of a collection of serial tasks Management: The policies by which: tasks are scheduled,

Terminology: ”Parallel Enabling Technologies”

Means to the end

� Bottom-up

OpenMPI OpenMPCUDA OpenGL

● Maximum flexibility

● Maximum headaches

● Must implement fault tolerance

� Top-downHadoop GoldenorbGraphLab

● Limited flexibility

● Fewer headaches

● Fault tolerance is inherited

� Self-contained environment

SPM.Python● Maximum flexibility

● Fewest headaches

● Fault tolerance is inherited

N environments/installations for N frameworks

One environment/installation, N suites of pclosures>>> createVirtualCloud -async>>> cmdA >>> cmdA -parallel>>> cmdB >>> cmdB -parallel>>> cmdC >>> cmdC -parallel>>> cmdD >>> cmdD -parallel

Page 15: Advanced Parallel Primitives in SPM.Python for Inheriting ......Parallelism: The management of a collection of serial tasks Management: The policies by which: tasks are scheduled,

Terminology: ”Parallel Enabling Technologies”

Means to the end

� Bottom-up

OpenMPI OpenMPCUDA OpenGL

● Maximum flexibility

● Maximum headaches

● Must implement fault tolerance

� Top-downHadoop GoldenorbGraphLab

● Limited flexibility

● Fewer headaches

● Fault tolerance is inherited

� Self-contained environment

SPM.Python● Maximum flexibility

● Fewest headaches

● Fault tolerance is inherited

N environments/installations for N frameworks

One environment/installation, N suites of pclosures>>> createVirtualCloud -async>>> cmdA >>> cmdA -parallel>>> cmdB >>> cmdB -parallel>>> cmdC >>> cmdC -parallel>>> cmdD >>> cmdD -parallel

Page 16: Advanced Parallel Primitives in SPM.Python for Inheriting ......Parallelism: The management of a collection of serial tasks Management: The policies by which: tasks are scheduled,

Terminology: ”Parallel Enabling Technologies”

Means to the end

� Bottom-up

OpenMPI OpenMPCUDA OpenGL

● Maximum flexibility

● Maximum headaches

● Must implement fault tolerance

� Top-downHadoop GoldenorbGraphLab

● Limited flexibility

● Fewer headaches

● Fault tolerance is inherited

� Self-contained environment

SPM.Python● Maximum flexibility

● Fewest headaches

● Fault tolerance is inherited

N environments/installations for N frameworks

One environment/installation, N suites of pclosures>>> createVirtualCloud -async>>> cmdA >>> cmdA -parallel>>> cmdB >>> cmdB -parallel>>> cmdC >>> cmdC -parallel>>> cmdD >>> cmdD -parallel

Page 17: Advanced Parallel Primitives in SPM.Python for Inheriting ......Parallelism: The management of a collection of serial tasks Management: The policies by which: tasks are scheduled,

Terminology: ”Parallel Enabling Technologies”

Means to the end

� Bottom-up

OpenMPI OpenMPCUDA OpenGL

● Maximum flexibility

● Maximum headaches

● Must implement fault tolerance

� Top-downHadoop GoldenorbGraphLab

● Limited flexibility

● Fewer headaches

● Fault tolerance is inherited

� Self-contained environment

SPM.Python● Maximum flexibility

● Fewest headaches

● Fault tolerance is inherited

N environments/installations for N frameworks

One environment/installation, N suites of pclosures>>> createVirtualCloud -async>>> cmdA >>> cmdA -parallel>>> cmdB >>> cmdB -parallel>>> cmdC >>> cmdC -parallel>>> cmdD >>> cmdD -parallel

Page 18: Advanced Parallel Primitives in SPM.Python for Inheriting ......Parallelism: The management of a collection of serial tasks Management: The policies by which: tasks are scheduled,

Terminology: ”Parallel Enabling Technologies”

Means to the end

� Bottom-up

OpenMPI OpenMPCUDA OpenGL

● Maximum flexibility

● Maximum headaches

● Must implement fault tolerance

� Top-downHadoop GoldenorbGraphLab

● Limited flexibility

● Fewer headaches

● Fault tolerance is inherited

� Self-contained environment

SPM.Python● Maximum flexibility

● Fewest headaches

● Fault tolerance is inherited

N environments/installations for N frameworks

One environment/installation, N suites of pclosures>>> createVirtualCloud -async>>> cmdA >>> cmdA -parallel>>> cmdB >>> cmdB -parallel>>> cmdC >>> cmdC -parallel>>> cmdD >>> cmdD -parallel

Page 19: Advanced Parallel Primitives in SPM.Python for Inheriting ......Parallelism: The management of a collection of serial tasks Management: The policies by which: tasks are scheduled,

Terminology: ”Parallel Enabling Technologies”

Means to the end

� Bottom-up

OpenMPI OpenMPCUDA OpenGL

● Maximum flexibility

● Maximum headaches

● Must implement fault tolerance

� Top-downHadoop GoldenorbGraphLab

● Limited flexibility

● Fewer headaches

● Fault tolerance is inherited

� Self-contained environment

SPM.Python● Maximum flexibility

● Fewest headaches

● Fault tolerance is inherited

N environments/installations for N frameworks

One environment/installation, N suites of pclosures

>>> createVirtualCloud -async>>> cmdA >>> cmdA -parallel>>> cmdB >>> cmdB -parallel>>> cmdC >>> cmdC -parallel>>> cmdD >>> cmdD -parallel

Page 20: Advanced Parallel Primitives in SPM.Python for Inheriting ......Parallelism: The management of a collection of serial tasks Management: The policies by which: tasks are scheduled,

Terminology: ”Parallel Enabling Technologies”

Means to the end

� Bottom-up

OpenMPI OpenMPCUDA OpenGL

● Maximum flexibility

● Maximum headaches

● Must implement fault tolerance

� Top-downHadoop GoldenorbGraphLab

● Limited flexibility

● Fewer headaches

● Fault tolerance is inherited

� Self-contained environment

SPM.Python● Maximum flexibility

● Fewest headaches

● Fault tolerance is inherited

N environments/installations for N frameworks

One environment/installation, N suites of pclosures>>> createVirtualCloud -async>>> cmdA >>> cmdA -parallel>>> cmdB >>> cmdB -parallel>>> cmdC >>> cmdC -parallel>>> cmdD >>> cmdD -parallel

Page 21: Advanced Parallel Primitives in SPM.Python for Inheriting ......Parallelism: The management of a collection of serial tasks Management: The policies by which: tasks are scheduled,

SPM.Python: Typical Flow

Visualization

Life Sciences

Finance

ITSoftware

Development

EDA

Analytics

Gap between intent

and API of

parallel primitives

Architectural● Scalable vocabulary

Developer

● Correct-by-construction

fault tolerance

self-cleaning

● Construct-by-correction

rapid prototyping

IT● No certification (!)

Page 22: Advanced Parallel Primitives in SPM.Python for Inheriting ......Parallelism: The management of a collection of serial tasks Management: The policies by which: tasks are scheduled,

SPM.Python: Typical Flow

Visualization

Life Sciences

Finance

ITSoftware

Development

EDA

Analytics

Gap between intent

and API of

parallel primitives

Architectural● Scalable vocabulary

Developer

● Correct-by-construction

fault tolerance

self-cleaning

● Construct-by-correction

rapid prototyping

IT● No certification (!)

Page 23: Advanced Parallel Primitives in SPM.Python for Inheriting ......Parallelism: The management of a collection of serial tasks Management: The policies by which: tasks are scheduled,

SPM.Python: Typical Flow

Visualization

Life Sciences

Finance

ITSoftware

Development

EDA

Analytics

Gap between intent

and API of

parallel primitives

Architectural● Scalable vocabulary

Developer

● Correct-by-construction

fault tolerance

self-cleaning

● Construct-by-correction

rapid prototyping

IT● No certification (!)

Page 24: Advanced Parallel Primitives in SPM.Python for Inheriting ......Parallelism: The management of a collection of serial tasks Management: The policies by which: tasks are scheduled,

SPM.Python: Typical Flow

Visualization

Life Sciences

Finance

ITSoftware

Development

EDA

Analytics

Gap between intent

and API of

parallel primitives

Architectural● Scalable vocabulary

Developer

● Correct-by-construction

fault tolerance

self-cleaning

● Construct-by-correction

rapid prototyping

IT● No certification (!)

Page 25: Advanced Parallel Primitives in SPM.Python for Inheriting ......Parallelism: The management of a collection of serial tasks Management: The policies by which: tasks are scheduled,

SPM.Python: Typical Flow

Visualization

Life Sciences

Finance

ITSoftware

Development

EDA

Analytics

Gap between intent

and API of

parallel primitives

Fundamental Prerequisite

Ability to express parallelism in terms of parallelprimitives (pclosures)

Page 26: Advanced Parallel Primitives in SPM.Python for Inheriting ......Parallelism: The management of a collection of serial tasks Management: The policies by which: tasks are scheduled,

Problem Statement

... exploiting parallelism using parallel primitives

frameworks

libraries

Clone

CloneRepeat

PartitionAggregate

Decentralized

PartitionAggregate

Centralized

PartitionList

PartitionDAG

{● Single, self-contained parallel environment● Patented Technology ...

Enable any OpenMPI application to inherit support for:

● Fault tolerance● Timeout● Detection of deadlocks

Partition/OpenMPI

Suites of parallel primitives to process data and graphsin parallel

Partition/HybridFlow

Page 27: Advanced Parallel Primitives in SPM.Python for Inheriting ......Parallelism: The management of a collection of serial tasks Management: The policies by which: tasks are scheduled,

Partition/OpenMPI: Prologue

GNU/Linux [] mpirun ... ./hello world -prefix ”api”

Typical OpenMPI application ... lacks support for:

● fault tolerance

● timeout

● detection of deadlocks

Page 28: Advanced Parallel Primitives in SPM.Python for Inheriting ......Parallelism: The management of a collection of serial tasks Management: The policies by which: tasks are scheduled,

Partition/OpenMPI: Prologue

GNU/Linux [] mpirun ... ./hello world -prefix ”api”

Typical OpenMPI application ... lacks support for:

● fault tolerance

● timeout

● detection of deadlocks

Page 29: Advanced Parallel Primitives in SPM.Python for Inheriting ......Parallelism: The management of a collection of serial tasks Management: The policies by which: tasks are scheduled,

Partition/OpenMPI: Prologue

GNU/Linux [] mpirun ... ./hello world -prefix ”api”

Typical OpenMPI application ... lacks support for:

● fault tolerance

● timeout

● detection of deadlocks

⇒ Prototyping is (deeply)∞

frustrating

Page 30: Advanced Parallel Primitives in SPM.Python for Inheriting ......Parallelism: The management of a collection of serial tasks Management: The policies by which: tasks are scheduled,

Partition/OpenMPI: Problem Statement

Prototyping should be frictionless

Must use original OpenMPI application� original source code� original binary

Original OpenMPI application must inherit support for:� fault tolerance� timeout� detecting deadlocks

GNU/Linux []spm.python ...

mpirun ... ./hello world -prefix ”api”

Page 31: Advanced Parallel Primitives in SPM.Python for Inheriting ......Parallelism: The management of a collection of serial tasks Management: The policies by which: tasks are scheduled,

Partition/OpenMPI: Problem Statement

Prototyping should be frictionless

Must use original OpenMPI application� original source code� original binary

Original OpenMPI application must inherit support for:� fault tolerance� timeout� detecting deadlocks

GNU/Linux []spm.python ...

mpirun ... ./hello world -prefix ”api”

Page 32: Advanced Parallel Primitives in SPM.Python for Inheriting ......Parallelism: The management of a collection of serial tasks Management: The policies by which: tasks are scheduled,

Partition/OpenMPI: Problem Statement

Prototyping should be frictionless

Must use original OpenMPI application� original source code� original binary

Original OpenMPI application must inherit support for:� fault tolerance� timeout� detecting deadlocks

GNU/Linux []spm.python ...

mpirun ... ./hello world -prefix ”api”

Page 33: Advanced Parallel Primitives in SPM.Python for Inheriting ......Parallelism: The management of a collection of serial tasks Management: The policies by which: tasks are scheduled,

Partition/OpenMPI: Problem Statement

Prototyping should be frictionless

Must use original OpenMPI application� original source code� original binary

Original OpenMPI application must inherit support for:� fault tolerance� timeout� detecting deadlocks

GNU/Linux []spm.python ...

mpirun ... ./hello world -prefix ”api”

Page 34: Advanced Parallel Primitives in SPM.Python for Inheriting ......Parallelism: The management of a collection of serial tasks Management: The policies by which: tasks are scheduled,

Partition/OpenMPI: Problem Statement (Cont’d)

GNU/Linux []spm.python ...

mpirun ... ./hello world -prefix ”api”

AB

Exploiting two very different forms of parallelism:� Using same resources� At the same time

Drop-inreplacement for

mpirun

Multiple sessions ofmpirun

within a single session ofof spm.python

Can use same resources for:

● Checkpoint based parallelism

● What-if analysis

● Stress testing

Page 35: Advanced Parallel Primitives in SPM.Python for Inheriting ......Parallelism: The management of a collection of serial tasks Management: The policies by which: tasks are scheduled,

Partition/OpenMPI: Problem Statement (Cont’d)

GNU/Linux []spm.python ...

mpirun ... ./hello world -prefix ”api”

AB

Exploiting two very different forms of parallelism:� Using same resources� At the same time

Drop-inreplacement for

mpirun

Multiple sessions ofmpirun

within a single session ofof spm.python

Can use same resources for:

● Checkpoint based parallelism

● What-if analysis

● Stress testing

Page 36: Advanced Parallel Primitives in SPM.Python for Inheriting ......Parallelism: The management of a collection of serial tasks Management: The policies by which: tasks are scheduled,

Partition/OpenMPI: Problem Statement (Cont’d)

GNU/Linux []spm.python ...

mpirun ... ./hello world -prefix ”api”

AB

Exploiting two very different forms of parallelism:� Using same resources� At the same time

Drop-inreplacement for

mpirun

Multiple sessions ofmpirun

within a single session ofof spm.python

Can use same resources for:

● Checkpoint based parallelism

● What-if analysis

● Stress testing

Page 37: Advanced Parallel Primitives in SPM.Python for Inheriting ......Parallelism: The management of a collection of serial tasks Management: The policies by which: tasks are scheduled,

Partition/OpenMPI: Problem Statement (Cont’d)

GNU/Linux []spm.python ...

mpirun ... ./hello world -prefix ”api”

AB

Exploiting two very different forms of parallelism:� Using same resources� At the same time

Drop-inreplacement for

mpirun

Multiple sessions ofmpirun

within a single session ofof spm.python

Can use same resources for:

● Checkpoint based parallelism

● What-if analysis

● Stress testing

Page 38: Advanced Parallel Primitives in SPM.Python for Inheriting ......Parallelism: The management of a collection of serial tasks Management: The policies by which: tasks are scheduled,

Partition/OpenMPI: Problem Statement (Cont’d)

GNU/Linux []spm.python ...

mpirun ... ./hello world -prefix ”api”

AB

Exploiting two very different forms of parallelism:� Using same resources� At the same time

Drop-inreplacement for

mpirun

Multiple sessions ofmpirun

within a single session ofof spm.python

Can use same resources for:

● Checkpoint based parallelism

● What-if analysis

● Stress testing

Page 39: Advanced Parallel Primitives in SPM.Python for Inheriting ......Parallelism: The management of a collection of serial tasks Management: The policies by which: tasks are scheduled,

Partition/OpenMPI: Problem Statement (Cont’d)

GNU/Linux []spm.python ...

mpirun ... ./hello world -prefix ”api”

AB

Exploiting two very different forms of parallelism:� Using same resources� At the same time

Drop-inreplacement for

mpirun

Multiple sessions ofmpirun

within a single session ofof spm.python

Can use same resources for:

● Checkpoint based parallelism

● What-if analysis

● Stress testing

Page 40: Advanced Parallel Primitives in SPM.Python for Inheriting ......Parallelism: The management of a collection of serial tasks Management: The policies by which: tasks are scheduled,

Partition/OpenMPI: Problem Statement (Cont’d)

GNU/Linux []spm.python ...

mpirun ... ./hello world -prefix ”api”

AB

Exploiting two very different forms of parallelism:� Using same resources� At the same time

Drop-inreplacement for

mpirun

Multiple sessions ofmpirun

within a single session ofof spm.python

Can use same resources for:

● Checkpoint based parallelism

● What-if analysis

● Stress testing

Page 41: Advanced Parallel Primitives in SPM.Python for Inheriting ......Parallelism: The management of a collection of serial tasks Management: The policies by which: tasks are scheduled,

Partition/OpenMPI: Anatomy - Timeline

GNU/Linux []spm.python ...

mpirun ./hello world -prefix ”api”

Page 42: Advanced Parallel Primitives in SPM.Python for Inheriting ......Parallelism: The management of a collection of serial tasks Management: The policies by which: tasks are scheduled,

Partition/OpenMPI: Anatomy - Timeline (Cont’d)

Hub mpirun Spoke orted wrapper Application

exit();

exit();exit();

exit();

1

2 34

5

67

Launch:● mpirun

Monitor:● mpirun● Spokes

Launch:● orted

Monitor:● orted● wrapper

Launch:● Application

Monitor/Timeout:● Application

NormalExecution

Page 43: Advanced Parallel Primitives in SPM.Python for Inheriting ......Parallelism: The management of a collection of serial tasks Management: The policies by which: tasks are scheduled,

Partition/OpenMPI: Anatomy - Timeline (Cont’d)

Hub mpirun Spoke orted wrapper Application

exit();

exit();exit();

exit();

1

2 34

5

67

Launch:● mpirun

Monitor:● mpirun● Spokes

Launch:● orted

Monitor:● orted● wrapper

Launch:● Application

Monitor/Timeout:● Application

NormalExecution

Page 44: Advanced Parallel Primitives in SPM.Python for Inheriting ......Parallelism: The management of a collection of serial tasks Management: The policies by which: tasks are scheduled,

Partition/OpenMPI: Anatomy - Timeline (Cont’d)

Hub mpirun Spoke orted wrapper Application

exit();

exit();exit();

exit();

1

2 34

5

67

Launch:● mpirun

Monitor:● mpirun● Spokes

Launch:● orted

Monitor:● orted● wrapper

Launch:● Application

Monitor/Timeout:● Application

NormalExecution

Establish a nervous system over the OpenMPI application

Populate nervous systemwith streams oftime-series data

Page 45: Advanced Parallel Primitives in SPM.Python for Inheriting ......Parallelism: The management of a collection of serial tasks Management: The policies by which: tasks are scheduled,

Partition/OpenMPI: Anatomy - Breakdown

Hub mpirun Spoke orted wrapper Application

exit();

exit();exit();

exit();

1

2 34

5

67

Launch:● mpirun

Monitor:● mpirun● Spokes

Launch:● orted

Monitor:● orted● wrapper

Launch:● Application

Monitor/Timeout:● Application

NormalExecution

Built-in Package Management System

● Selectively change default OpenMPI env

Redirection of library calls

● Augment libmpi.so, libc.so ...

with libSPM.so

Second Parallel Capability

● ∼ 60-line Python script

● Authored by developer

Page 46: Advanced Parallel Primitives in SPM.Python for Inheriting ......Parallelism: The management of a collection of serial tasks Management: The policies by which: tasks are scheduled,

Partition/OpenMPI: Second Parallel Capability

@spm.util.dassert(predicateCb = spm.sys.sstat.amOffline)@spm.util.dassert(predicateCb = spm.sys.pstat.amHub)def __init():return spm.pclosure.macro.papply.template.openMPI.\

policyA.defun(signature = ’signature::Hub’,stage1Cb = __taskStat,);

__pc = __init();

Declaration + Definition of Pclosure

Page 47: Advanced Parallel Primitives in SPM.Python for Inheriting ......Parallelism: The management of a collection of serial tasks Management: The policies by which: tasks are scheduled,

Partition/OpenMPI: Second Parallel Capability

@spm.util.dassert(predicateCb = spm.sys.sstat.amOffline)@spm.util.dassert(predicateCb = spm.sys.pstat.amHub)def main(pool,

taskApiArgs,taskTimeout):

# Initialize ’stage0’.__pc.stage0.init.main(typedef = ...);hdl = __pc.stage0.payload.tie();# Populate the template taskhdl.spm.meta.label = ’***’; # Not interested.hdl.spm.meta.apiArgs = taskApiArgs;hdl.spm.meta.timeout = taskTimeout;# Invoke the pmanager__pc.stage0.event.manage(pool = pool,

nSpokesMin = ...nSpokesMax = ...timeoutWaitForSpokes = ...timeoutExecution = ...);

return;

Population + Invocation of Pclosure

Page 48: Advanced Parallel Primitives in SPM.Python for Inheriting ......Parallelism: The management of a collection of serial tasks Management: The policies by which: tasks are scheduled,

Partition/OpenMPI: Second Parallel Capability

r"""task<template> ::struct {# SPM component ...spm ::struct {

meta ::struct {label ::scalar<stringSnippet> = deferred;apiArgs ::dict<string,mixed> = deferred;timeout ::scalar<timeout> = deferred;

};

core ::struct {relaunchPre ::scalar<bool> = None;relaunchPost ::scalar<bool> = None;nameHost ::scalar<auto> = None;whoAmI ::scalar<auto> = None;

};

stat ::struct {exception ::scalar<auto> = None;returnValue ::scalar<record> = None;

};};# non-SPM component ...

};"""

Typedef for Template Task

Page 49: Advanced Parallel Primitives in SPM.Python for Inheriting ......Parallelism: The management of a collection of serial tasks Management: The policies by which: tasks are scheduled,

Partition/OpenMPI: Second Parallel Capability

@spm.util.dassert(predicateCb = spm.sys.sstat.amOnline)@spm.util.dassert(predicateCb = spm.sys.pstat.amHub)def __taskStat(pc):try:hdl = pc.stage1.payload.tie();returnValue = hdl.spm.stat.returnValue;if (returnValue.Has(attr = ’stdOut’)):

print("\tstdOut : %s", returnValue.stdOut);if (returnValue.Has(attr = ’stdErr’)):

print("\tstdErr : %s", returnValue.stdErr);if (returnValue.Has(attr = ’stdOutErr’)):

print("\tstdOutErr: %s", returnValue.stdOutErr);except (SPMTaskDropped,

SPMTaskLoad,SPMTaskEval,), (hdl,):

pass;

return (pc.stage1.event.done(),None,)[-1];

Callback for Status Reports

Page 50: Advanced Parallel Primitives in SPM.Python for Inheriting ......Parallelism: The management of a collection of serial tasks Management: The policies by which: tasks are scheduled,

Partition/OpenMPI: SPM.Python Session

l GNU/Linux [] spm.3.111116.trial.A.python(Trial Edition)

Spm.Python 3.111116 / Python 2.4.6

[GCC 4.4.3 (64 bit) on linux2]

NOTE

>>>> Trial period ends at <<<<

>>>> 24:00 hrs (Pacific Standard Time) <<<<

>>>> December 29, 2011 <<<<

Type "help", "copyright", "credits", "license" or "spm.Api()" for more information.

Type "spm.DemoExtract(dirname = ...)" to extract demo scripts.

Please visit www.mbasciences.com for the latest and growing

collection of scripts and technical briefs classified in terms of

parallel management patterns.

l >>> import pooll >>> import demol >>> import os;l >>> taskApiArgs = \l dict(app = os.getcwd() + ’/hello_world’,l appOptions = "-prefix=’app’",l );l >>> taskTimeout = spm.util.timeout.after(seconds = 10);3 >>> demo.main(pool = pool.intraAll(),l taskApiArgs = taskApiArgs,l taskTimeout = taskTimeout)l #: MetaStatus (hub): Waiting - ForSpokes ...l #: MetaStatus (hub): Tasks - Evall app => 0l app => 1l #: MetaStatus (hub): Tasks - EvalDone3 >>> demo.main(pool = pool.intraOnePerServer(),l taskApiArgs = taskApiArgs,l taskTimeout = taskTimeout)l #: MetaStatus (hub): Waiting - ForSpokes ...l #: MetaStatus (hub): Tasks - Evall #: MetaStatus (hub): Tasks - EvalDonel >>> exit()l GNU/Linux []

Page 51: Advanced Parallel Primitives in SPM.Python for Inheriting ......Parallelism: The management of a collection of serial tasks Management: The policies by which: tasks are scheduled,

Problem Statement

... exploiting parallelism using parallel primitives

frameworks

libraries

Clone

CloneRepeat

PartitionAggregate

Decentralized

PartitionAggregate

Centralized

PartitionList

PartitionDAG

{● Single, self-contained parallel environment● Patented Technology ...

Enable any OpenMPI application to inherit support for:

● Fault tolerance● Timeout● Detection of deadlocks

Partition/OpenMPI

Suites of parallel primitives to process data and graphsin parallel

Partition/HybridFlow

Page 52: Advanced Parallel Primitives in SPM.Python for Inheriting ......Parallelism: The management of a collection of serial tasks Management: The policies by which: tasks are scheduled,

Partition/HybridFlow: Basic Template

while (not done):

try:

for work in pc.generate(...):

eval(work); # Local Python/C/C++/GPU computation

pc.counter.async += 1; # Update parallel data structure(s)

if (some condition):

raise pc.exception(...); # Parallel exception

if (some condition):

pc.emit(...); # Emit work/report

done = True;

except (pc.exception,) (val,):

if (some condition):

continue; # Repeat with new consensus (’val’)

done = True;

Page 53: Advanced Parallel Primitives in SPM.Python for Inheriting ......Parallelism: The management of a collection of serial tasks Management: The policies by which: tasks are scheduled,

Partition/HybridFlow: Basic Template

while (not done):

try:

for work in pc.generate(...):

eval(work); # Local Python/C/C++/GPU computation

pc.counter.async += 1; # Update parallel data structure(s)

if (some condition):

raise pc.exception(...); # Parallel exception

if (some condition):

pc.emit(...); # Emit work/report

done = True;

except (pc.exception,) (val,):

if (some condition):

continue; # Repeat with new consensus (’val’)

done = True;

Page 54: Advanced Parallel Primitives in SPM.Python for Inheriting ......Parallelism: The management of a collection of serial tasks Management: The policies by which: tasks are scheduled,

Partition/HybridFlow: Basic Template

while (not done):

try:

for work in pc.generate(...):

eval(work); # Local Python/C/C++/GPU computation

pc.counter.async += 1; # Update parallel data structure(s)

if (some condition):

raise pc.exception(...); # Parallel exception

if (some condition):

pc.emit(...); # Emit work/report

done = True;

except (pc.exception,) (val,):

if (some condition):

continue; # Repeat with new consensus (’val’)

done = True;

Page 55: Advanced Parallel Primitives in SPM.Python for Inheriting ......Parallelism: The management of a collection of serial tasks Management: The policies by which: tasks are scheduled,

Partition/HybridFlow: Basic Template

while (not done):

try:

for work in pc.generate(...):

eval(work); # Local Python/C/C++/GPU computation

pc.counter.async += 1; # Update parallel data structure(s)

if (some condition):

raise pc.exception(...); # Parallel exception

if (some condition):

pc.emit(...); # Emit work/report

done = True;

except (pc.exception,) (val,):

if (some condition):

continue; # Repeat with new consensus (’val’)

done = True;

Page 56: Advanced Parallel Primitives in SPM.Python for Inheriting ......Parallelism: The management of a collection of serial tasks Management: The policies by which: tasks are scheduled,

Partition/HybridFlow: Basic Template

while (not done):

try:

for work in pc.generate(...):

eval(work); # Local Python/C/C++/GPU computation

pc.counter.async += 1; # Update parallel data structure(s)

if (some condition):

raise pc.exception(...); # Parallel exception

if (some condition):

pc.emit(...); # Emit work/report

done = True;

except (pc.exception,) (val,):

if (some condition):

continue; # Repeat with new consensus (’val’)

done = True;

Page 57: Advanced Parallel Primitives in SPM.Python for Inheriting ......Parallelism: The management of a collection of serial tasks Management: The policies by which: tasks are scheduled,

Partition/HybridFlow: Suite of Parallel Primitives

while (not done):try:

for work in pc.generate(...):eval(work);pc.counter.async += 1;if (some condition):

raise pc.exception(...);if (some condition):

pc.emit(...);

done = True;except (pc.exception,) (val,):

if (some condition):continue;

done = True;

Page 58: Advanced Parallel Primitives in SPM.Python for Inheriting ......Parallelism: The management of a collection of serial tasks Management: The policies by which: tasks are scheduled,

Partition/HybridFlow: Suite of Parallel Primitives

while (not done):try:

for work in pc.generate(...):eval(work);pc.counter.async += 1;if (some condition):

raise pc.exception(...);if (some condition):

pc.emit(...);

done = True;except (pc.exception,) (val,):

if (some condition):continue;

done = True;

pc.generator(...);

pc.emit(...);pc.exception(...);

pc.counter.async;

Page 59: Advanced Parallel Primitives in SPM.Python for Inheriting ......Parallelism: The management of a collection of serial tasks Management: The policies by which: tasks are scheduled,

Partition/HybridFlow: Suite of Parallel Primitives

while (not done):try:

for work in pc.generate(...):eval(work);pc.counter.async += 1;if (some condition):

raise pc.exception(...);if (some condition):

pc.emit(...);

done = True;except (pc.exception,) (val,):

if (some condition):continue;

done = True;

pc.generator(...);

pc.emit(...);pc.exception(...);

pc.counter.async;

BAP

Page 60: Advanced Parallel Primitives in SPM.Python for Inheriting ......Parallelism: The management of a collection of serial tasks Management: The policies by which: tasks are scheduled,

Partition/HybridFlow: Suite of Parallel Primitives

while (not done):try:

for work in pc.generate(...):eval(work);pc.counter.async += 1;if (some condition):

raise pc.exception(...);if (some condition):

pc.emit(...);

done = True;except (pc.exception,) (val,):

if (some condition):continue;

done = True;

pc.generator(...);

pc.emit(...);pc.exception(...);

pc.counter.async;

BAP BSPSpeculative

Page 61: Advanced Parallel Primitives in SPM.Python for Inheriting ......Parallelism: The management of a collection of serial tasks Management: The policies by which: tasks are scheduled,

Partition/HybridFlow: Suite of Parallel Primitives

while (not done):try:

for work in pc.generate(...):eval(work);pc.counter.async += 1;if (some condition):

raise pc.exception(...);if (some condition):

pc.emit(...);

done = True;except (pc.exception,) (val,):

if (some condition):continue;

done = True;

pc.generator(...);

pc.emit(...);pc.exception(...);

pc.counter.async;

BAP BSPSpeculative BSP

Page 62: Advanced Parallel Primitives in SPM.Python for Inheriting ......Parallelism: The management of a collection of serial tasks Management: The policies by which: tasks are scheduled,

Partition/HybridFlow: Suite of Parallel Primitives

while (not done):try:

for work in pc.generate(...):eval(work);pc.counter.async += 1;if (some condition):

raise pc.exception(...);if (some condition):

pc.emit(...);

done = True;except (pc.exception,) (val,):

if (some condition):continue;

done = True;

pc.generator(...);

pc.emit(...);pc.exception(...);

pc.counter.async;

BAP BSPSpeculative BSP DAG

Page 63: Advanced Parallel Primitives in SPM.Python for Inheriting ......Parallelism: The management of a collection of serial tasks Management: The policies by which: tasks are scheduled,

Partition/HybridFlow: Suite of Parallel Primitives

while (not done):try:

for work in pc.generate(...):eval(work);pc.counter.async += 1;if (some condition):

raise pc.exception(...);if (some condition):

pc.emit(...);

done = True;except (pc.exception,) (val,):

if (some condition):continue;

done = True;

pc.generator(...);

pc.emit(...);pc.exception(...);

pc.counter.async;

BAP BSPSpeculative BSP DAG

● ● ●

Page 64: Advanced Parallel Primitives in SPM.Python for Inheriting ......Parallelism: The management of a collection of serial tasks Management: The policies by which: tasks are scheduled,

Conclusion

... exploiting parallelism using parallel primitives

frameworks

libraries

Clone

CloneRepeat

PartitionAggregate

Decentralized

PartitionAggregate

Centralized

PartitionList

PartitionDAG

{● Single, self-contained parallel environment● Patented Technology ...

Enable any OpenMPI application to inherit support for:

● Fault tolerance● Timeout● Detection of deadlocks

Partition/OpenMPI

Suites of parallel primitives to process data and graphsin parallel

Partition/HybridFlow

Page 65: Advanced Parallel Primitives in SPM.Python for Inheriting ......Parallelism: The management of a collection of serial tasks Management: The policies by which: tasks are scheduled,

Conclusion (Cont’d)

http://www.mbasciences.com

⎧⎪⎪⎪⎨⎪⎪⎪⎩

SPM.Python distribution

Technical Briefs

Parallel Management Patterns

⎫⎪⎪⎪⎬⎪⎪⎪⎭

CloneOnceRepeat

PartitionDAGList

PartitionAggregateCentralizedDecentralized

Elementary

Parallel Primitives

PartitionGrid/OpenMPI

In

Limited Beta

HPC

Parallel Primitives

PartitionData FlowGraph

Limited Beta

Jan 24, 2012

Data / Graph

Parallel Primitives