Advanced Parallel Primitives in SPM.Python for Inheriting ......Parallelism: The management of a...

Post on 20-May-2020

4 views 0 download

Transcript of Advanced Parallel Primitives in SPM.Python for Inheriting ......Parallelism: The management of a...

Advanced Parallel Primitives in SPM.Pythonfor Inheriting Fault-Tolerance, and Scalable

Processing of Data and Graphs

Minesh B. Aminmamin @ mbasciences.com

http://www.mbasciences.com

HPC Advisory Council / Stanford Workshop 2011

Stanford University, CA

Dec 7, 2011

© 2011 MBA Sciences, Inc. All rights reserved.

Problem Statement

... exploiting parallelism using libraries

Problem Statement

... exploiting parallelism using frameworks

libraries

Problem Statement

... exploiting parallelism using parallel primitives

frameworks

libraries

Problem Statement

... exploiting parallelism using parallel primitives

frameworks

libraries

Clone

CloneRepeat

PartitionAggregate

Decentralized

PartitionAggregate

Centralized

PartitionList

PartitionDAG

{● Single, self-contained parallel environment● Patented Technology ...

Enable any OpenMPI application to inherit support for:

● Fault tolerance● Timeout● Detection of deadlocks

Partition/OpenMPI

Suites of parallel primitives to process data and graphsin parallel

Partition/HybridFlow

Problem Statement

... exploiting parallelism using parallel primitives

frameworks

libraries

Clone

CloneRepeat

PartitionAggregate

Decentralized

PartitionAggregate

Centralized

PartitionList

PartitionDAG

{● Single, self-contained parallel environment● Patented Technology ...

Enable any OpenMPI application to inherit support for:

● Fault tolerance● Timeout● Detection of deadlocks

Partition/OpenMPI

Suites of parallel primitives to process data and graphsin parallel

Partition/HybridFlow

Problem Statement

... exploiting parallelism using parallel primitives

frameworks

libraries

Clone

CloneRepeat

PartitionAggregate

Decentralized

PartitionAggregate

Centralized

PartitionList

PartitionDAG

{● Single, self-contained parallel environment● Patented Technology ...

Enable any OpenMPI application to inherit support for:

● Fault tolerance● Timeout● Detection of deadlocks

Partition/OpenMPI

Suites of parallel primitives to process data and graphsin parallel

Partition/HybridFlow

Terminology: ”Exploiting Parallelism”

Parallelism: The management of a collection of serial tasks

Management: The policies by which:● tasks are scheduled,

● premature terminations are handled,

● preemptive support is provided,

● communication primitives are enabled/disabled, and

● the manner in which resources are obtained andreleased

Serial Tasks: Are classified in terms of either:● Coarse grain ... where tasks may not communicate

prior to conclusion, or

● Fine grain ... where tasks may communicate priorto conclusion.

Management policies codify how serial tasks areto be managed ... independent of what they may be

Terminology: ”Exploiting Parallelism”

Parallelism: The management of a collection of serial tasks

Management: The policies by which:● tasks are scheduled,

● premature terminations are handled,

● preemptive support is provided,

● communication primitives are enabled/disabled, and

● the manner in which resources are obtained andreleased

Serial Tasks: Are classified in terms of either:● Coarse grain ... where tasks may not communicate

prior to conclusion, or

● Fine grain ... where tasks may communicate priorto conclusion.

Management policies codify how serial tasks areto be managed ... independent of what they may be

Terminology: ”The Big Picture”

Question: Is exploiting parallelism {easyhard

} ?

Terminology: ”The Big Picture”

Question: Is exploiting parallelism {easyhard

} ?

Terminology: ”The Big Picture”

Question: Is exploiting parallelism {easyhard

} ?What makes

Terminology: ”The Big Picture”

Question: Is exploiting parallelism {easyhard

} ?What makes

Supposition: The gap between developer’s intent and API of PET(parallel enabling technology) ...

Terminology: ”Parallel Enabling Technologies”

Means to the end

� Bottom-up

OpenMPI OpenMPCUDA OpenGL

● Maximum flexibility

● Maximum headaches

● Must implement fault tolerance

� Top-downHadoop GoldenorbGraphLab

● Limited flexibility

● Fewer headaches

● Fault tolerance is inherited

� Self-contained environment

SPM.Python● Maximum flexibility

● Fewest headaches

● Fault tolerance is inherited

N environments/installations for N frameworks

One environment/installation, N suites of pclosures>>> createVirtualCloud -async>>> cmdA >>> cmdA -parallel>>> cmdB >>> cmdB -parallel>>> cmdC >>> cmdC -parallel>>> cmdD >>> cmdD -parallel

Terminology: ”Parallel Enabling Technologies”

Means to the end

� Bottom-up

OpenMPI OpenMPCUDA OpenGL

● Maximum flexibility

● Maximum headaches

● Must implement fault tolerance

� Top-downHadoop GoldenorbGraphLab

● Limited flexibility

● Fewer headaches

● Fault tolerance is inherited

� Self-contained environment

SPM.Python● Maximum flexibility

● Fewest headaches

● Fault tolerance is inherited

N environments/installations for N frameworks

One environment/installation, N suites of pclosures>>> createVirtualCloud -async>>> cmdA >>> cmdA -parallel>>> cmdB >>> cmdB -parallel>>> cmdC >>> cmdC -parallel>>> cmdD >>> cmdD -parallel

Terminology: ”Parallel Enabling Technologies”

Means to the end

� Bottom-up

OpenMPI OpenMPCUDA OpenGL

● Maximum flexibility

● Maximum headaches

● Must implement fault tolerance

� Top-downHadoop GoldenorbGraphLab

● Limited flexibility

● Fewer headaches

● Fault tolerance is inherited

� Self-contained environment

SPM.Python● Maximum flexibility

● Fewest headaches

● Fault tolerance is inherited

N environments/installations for N frameworks

One environment/installation, N suites of pclosures>>> createVirtualCloud -async>>> cmdA >>> cmdA -parallel>>> cmdB >>> cmdB -parallel>>> cmdC >>> cmdC -parallel>>> cmdD >>> cmdD -parallel

Terminology: ”Parallel Enabling Technologies”

Means to the end

� Bottom-up

OpenMPI OpenMPCUDA OpenGL

● Maximum flexibility

● Maximum headaches

● Must implement fault tolerance

� Top-downHadoop GoldenorbGraphLab

● Limited flexibility

● Fewer headaches

● Fault tolerance is inherited

� Self-contained environment

SPM.Python● Maximum flexibility

● Fewest headaches

● Fault tolerance is inherited

N environments/installations for N frameworks

One environment/installation, N suites of pclosures>>> createVirtualCloud -async>>> cmdA >>> cmdA -parallel>>> cmdB >>> cmdB -parallel>>> cmdC >>> cmdC -parallel>>> cmdD >>> cmdD -parallel

Terminology: ”Parallel Enabling Technologies”

Means to the end

� Bottom-up

OpenMPI OpenMPCUDA OpenGL

● Maximum flexibility

● Maximum headaches

● Must implement fault tolerance

� Top-downHadoop GoldenorbGraphLab

● Limited flexibility

● Fewer headaches

● Fault tolerance is inherited

� Self-contained environment

SPM.Python● Maximum flexibility

● Fewest headaches

● Fault tolerance is inherited

N environments/installations for N frameworks

One environment/installation, N suites of pclosures>>> createVirtualCloud -async>>> cmdA >>> cmdA -parallel>>> cmdB >>> cmdB -parallel>>> cmdC >>> cmdC -parallel>>> cmdD >>> cmdD -parallel

Terminology: ”Parallel Enabling Technologies”

Means to the end

� Bottom-up

OpenMPI OpenMPCUDA OpenGL

● Maximum flexibility

● Maximum headaches

● Must implement fault tolerance

� Top-downHadoop GoldenorbGraphLab

● Limited flexibility

● Fewer headaches

● Fault tolerance is inherited

� Self-contained environment

SPM.Python● Maximum flexibility

● Fewest headaches

● Fault tolerance is inherited

N environments/installations for N frameworks

One environment/installation, N suites of pclosures

>>> createVirtualCloud -async>>> cmdA >>> cmdA -parallel>>> cmdB >>> cmdB -parallel>>> cmdC >>> cmdC -parallel>>> cmdD >>> cmdD -parallel

Terminology: ”Parallel Enabling Technologies”

Means to the end

� Bottom-up

OpenMPI OpenMPCUDA OpenGL

● Maximum flexibility

● Maximum headaches

● Must implement fault tolerance

� Top-downHadoop GoldenorbGraphLab

● Limited flexibility

● Fewer headaches

● Fault tolerance is inherited

� Self-contained environment

SPM.Python● Maximum flexibility

● Fewest headaches

● Fault tolerance is inherited

N environments/installations for N frameworks

One environment/installation, N suites of pclosures>>> createVirtualCloud -async>>> cmdA >>> cmdA -parallel>>> cmdB >>> cmdB -parallel>>> cmdC >>> cmdC -parallel>>> cmdD >>> cmdD -parallel

SPM.Python: Typical Flow

Visualization

Life Sciences

Finance

ITSoftware

Development

EDA

Analytics

Gap between intent

and API of

parallel primitives

Architectural● Scalable vocabulary

Developer

● Correct-by-construction

fault tolerance

self-cleaning

● Construct-by-correction

rapid prototyping

IT● No certification (!)

SPM.Python: Typical Flow

Visualization

Life Sciences

Finance

ITSoftware

Development

EDA

Analytics

Gap between intent

and API of

parallel primitives

Architectural● Scalable vocabulary

Developer

● Correct-by-construction

fault tolerance

self-cleaning

● Construct-by-correction

rapid prototyping

IT● No certification (!)

SPM.Python: Typical Flow

Visualization

Life Sciences

Finance

ITSoftware

Development

EDA

Analytics

Gap between intent

and API of

parallel primitives

Architectural● Scalable vocabulary

Developer

● Correct-by-construction

fault tolerance

self-cleaning

● Construct-by-correction

rapid prototyping

IT● No certification (!)

SPM.Python: Typical Flow

Visualization

Life Sciences

Finance

ITSoftware

Development

EDA

Analytics

Gap between intent

and API of

parallel primitives

Architectural● Scalable vocabulary

Developer

● Correct-by-construction

fault tolerance

self-cleaning

● Construct-by-correction

rapid prototyping

IT● No certification (!)

SPM.Python: Typical Flow

Visualization

Life Sciences

Finance

ITSoftware

Development

EDA

Analytics

Gap between intent

and API of

parallel primitives

Fundamental Prerequisite

Ability to express parallelism in terms of parallelprimitives (pclosures)

Problem Statement

... exploiting parallelism using parallel primitives

frameworks

libraries

Clone

CloneRepeat

PartitionAggregate

Decentralized

PartitionAggregate

Centralized

PartitionList

PartitionDAG

{● Single, self-contained parallel environment● Patented Technology ...

Enable any OpenMPI application to inherit support for:

● Fault tolerance● Timeout● Detection of deadlocks

Partition/OpenMPI

Suites of parallel primitives to process data and graphsin parallel

Partition/HybridFlow

Partition/OpenMPI: Prologue

GNU/Linux [] mpirun ... ./hello world -prefix ”api”

Typical OpenMPI application ... lacks support for:

● fault tolerance

● timeout

● detection of deadlocks

Partition/OpenMPI: Prologue

GNU/Linux [] mpirun ... ./hello world -prefix ”api”

Typical OpenMPI application ... lacks support for:

● fault tolerance

● timeout

● detection of deadlocks

Partition/OpenMPI: Prologue

GNU/Linux [] mpirun ... ./hello world -prefix ”api”

Typical OpenMPI application ... lacks support for:

● fault tolerance

● timeout

● detection of deadlocks

⇒ Prototyping is (deeply)∞

frustrating

Partition/OpenMPI: Problem Statement

Prototyping should be frictionless

Must use original OpenMPI application� original source code� original binary

Original OpenMPI application must inherit support for:� fault tolerance� timeout� detecting deadlocks

GNU/Linux []spm.python ...

mpirun ... ./hello world -prefix ”api”

Partition/OpenMPI: Problem Statement

Prototyping should be frictionless

Must use original OpenMPI application� original source code� original binary

Original OpenMPI application must inherit support for:� fault tolerance� timeout� detecting deadlocks

GNU/Linux []spm.python ...

mpirun ... ./hello world -prefix ”api”

Partition/OpenMPI: Problem Statement

Prototyping should be frictionless

Must use original OpenMPI application� original source code� original binary

Original OpenMPI application must inherit support for:� fault tolerance� timeout� detecting deadlocks

GNU/Linux []spm.python ...

mpirun ... ./hello world -prefix ”api”

Partition/OpenMPI: Problem Statement

Prototyping should be frictionless

Must use original OpenMPI application� original source code� original binary

Original OpenMPI application must inherit support for:� fault tolerance� timeout� detecting deadlocks

GNU/Linux []spm.python ...

mpirun ... ./hello world -prefix ”api”

Partition/OpenMPI: Problem Statement (Cont’d)

GNU/Linux []spm.python ...

mpirun ... ./hello world -prefix ”api”

AB

Exploiting two very different forms of parallelism:� Using same resources� At the same time

Drop-inreplacement for

mpirun

Multiple sessions ofmpirun

within a single session ofof spm.python

Can use same resources for:

● Checkpoint based parallelism

● What-if analysis

● Stress testing

Partition/OpenMPI: Problem Statement (Cont’d)

GNU/Linux []spm.python ...

mpirun ... ./hello world -prefix ”api”

AB

Exploiting two very different forms of parallelism:� Using same resources� At the same time

Drop-inreplacement for

mpirun

Multiple sessions ofmpirun

within a single session ofof spm.python

Can use same resources for:

● Checkpoint based parallelism

● What-if analysis

● Stress testing

Partition/OpenMPI: Problem Statement (Cont’d)

GNU/Linux []spm.python ...

mpirun ... ./hello world -prefix ”api”

AB

Exploiting two very different forms of parallelism:� Using same resources� At the same time

Drop-inreplacement for

mpirun

Multiple sessions ofmpirun

within a single session ofof spm.python

Can use same resources for:

● Checkpoint based parallelism

● What-if analysis

● Stress testing

Partition/OpenMPI: Problem Statement (Cont’d)

GNU/Linux []spm.python ...

mpirun ... ./hello world -prefix ”api”

AB

Exploiting two very different forms of parallelism:� Using same resources� At the same time

Drop-inreplacement for

mpirun

Multiple sessions ofmpirun

within a single session ofof spm.python

Can use same resources for:

● Checkpoint based parallelism

● What-if analysis

● Stress testing

Partition/OpenMPI: Problem Statement (Cont’d)

GNU/Linux []spm.python ...

mpirun ... ./hello world -prefix ”api”

AB

Exploiting two very different forms of parallelism:� Using same resources� At the same time

Drop-inreplacement for

mpirun

Multiple sessions ofmpirun

within a single session ofof spm.python

Can use same resources for:

● Checkpoint based parallelism

● What-if analysis

● Stress testing

Partition/OpenMPI: Problem Statement (Cont’d)

GNU/Linux []spm.python ...

mpirun ... ./hello world -prefix ”api”

AB

Exploiting two very different forms of parallelism:� Using same resources� At the same time

Drop-inreplacement for

mpirun

Multiple sessions ofmpirun

within a single session ofof spm.python

Can use same resources for:

● Checkpoint based parallelism

● What-if analysis

● Stress testing

Partition/OpenMPI: Problem Statement (Cont’d)

GNU/Linux []spm.python ...

mpirun ... ./hello world -prefix ”api”

AB

Exploiting two very different forms of parallelism:� Using same resources� At the same time

Drop-inreplacement for

mpirun

Multiple sessions ofmpirun

within a single session ofof spm.python

Can use same resources for:

● Checkpoint based parallelism

● What-if analysis

● Stress testing

Partition/OpenMPI: Anatomy - Timeline

GNU/Linux []spm.python ...

mpirun ./hello world -prefix ”api”

Partition/OpenMPI: Anatomy - Timeline (Cont’d)

Hub mpirun Spoke orted wrapper Application

exit();

exit();exit();

exit();

1

2 34

5

67

Launch:● mpirun

Monitor:● mpirun● Spokes

Launch:● orted

Monitor:● orted● wrapper

Launch:● Application

Monitor/Timeout:● Application

NormalExecution

Partition/OpenMPI: Anatomy - Timeline (Cont’d)

Hub mpirun Spoke orted wrapper Application

exit();

exit();exit();

exit();

1

2 34

5

67

Launch:● mpirun

Monitor:● mpirun● Spokes

Launch:● orted

Monitor:● orted● wrapper

Launch:● Application

Monitor/Timeout:● Application

NormalExecution

Partition/OpenMPI: Anatomy - Timeline (Cont’d)

Hub mpirun Spoke orted wrapper Application

exit();

exit();exit();

exit();

1

2 34

5

67

Launch:● mpirun

Monitor:● mpirun● Spokes

Launch:● orted

Monitor:● orted● wrapper

Launch:● Application

Monitor/Timeout:● Application

NormalExecution

Establish a nervous system over the OpenMPI application

Populate nervous systemwith streams oftime-series data

Partition/OpenMPI: Anatomy - Breakdown

Hub mpirun Spoke orted wrapper Application

exit();

exit();exit();

exit();

1

2 34

5

67

Launch:● mpirun

Monitor:● mpirun● Spokes

Launch:● orted

Monitor:● orted● wrapper

Launch:● Application

Monitor/Timeout:● Application

NormalExecution

Built-in Package Management System

● Selectively change default OpenMPI env

Redirection of library calls

● Augment libmpi.so, libc.so ...

with libSPM.so

Second Parallel Capability

● ∼ 60-line Python script

● Authored by developer

Partition/OpenMPI: Second Parallel Capability

@spm.util.dassert(predicateCb = spm.sys.sstat.amOffline)@spm.util.dassert(predicateCb = spm.sys.pstat.amHub)def __init():return spm.pclosure.macro.papply.template.openMPI.\

policyA.defun(signature = ’signature::Hub’,stage1Cb = __taskStat,);

__pc = __init();

Declaration + Definition of Pclosure

Partition/OpenMPI: Second Parallel Capability

@spm.util.dassert(predicateCb = spm.sys.sstat.amOffline)@spm.util.dassert(predicateCb = spm.sys.pstat.amHub)def main(pool,

taskApiArgs,taskTimeout):

# Initialize ’stage0’.__pc.stage0.init.main(typedef = ...);hdl = __pc.stage0.payload.tie();# Populate the template taskhdl.spm.meta.label = ’***’; # Not interested.hdl.spm.meta.apiArgs = taskApiArgs;hdl.spm.meta.timeout = taskTimeout;# Invoke the pmanager__pc.stage0.event.manage(pool = pool,

nSpokesMin = ...nSpokesMax = ...timeoutWaitForSpokes = ...timeoutExecution = ...);

return;

Population + Invocation of Pclosure

Partition/OpenMPI: Second Parallel Capability

r"""task<template> ::struct {# SPM component ...spm ::struct {

meta ::struct {label ::scalar<stringSnippet> = deferred;apiArgs ::dict<string,mixed> = deferred;timeout ::scalar<timeout> = deferred;

};

core ::struct {relaunchPre ::scalar<bool> = None;relaunchPost ::scalar<bool> = None;nameHost ::scalar<auto> = None;whoAmI ::scalar<auto> = None;

};

stat ::struct {exception ::scalar<auto> = None;returnValue ::scalar<record> = None;

};};# non-SPM component ...

};"""

Typedef for Template Task

Partition/OpenMPI: Second Parallel Capability

@spm.util.dassert(predicateCb = spm.sys.sstat.amOnline)@spm.util.dassert(predicateCb = spm.sys.pstat.amHub)def __taskStat(pc):try:hdl = pc.stage1.payload.tie();returnValue = hdl.spm.stat.returnValue;if (returnValue.Has(attr = ’stdOut’)):

print("\tstdOut : %s", returnValue.stdOut);if (returnValue.Has(attr = ’stdErr’)):

print("\tstdErr : %s", returnValue.stdErr);if (returnValue.Has(attr = ’stdOutErr’)):

print("\tstdOutErr: %s", returnValue.stdOutErr);except (SPMTaskDropped,

SPMTaskLoad,SPMTaskEval,), (hdl,):

pass;

return (pc.stage1.event.done(),None,)[-1];

Callback for Status Reports

Partition/OpenMPI: SPM.Python Session

l GNU/Linux [] spm.3.111116.trial.A.python(Trial Edition)

Spm.Python 3.111116 / Python 2.4.6

[GCC 4.4.3 (64 bit) on linux2]

NOTE

>>>> Trial period ends at <<<<

>>>> 24:00 hrs (Pacific Standard Time) <<<<

>>>> December 29, 2011 <<<<

Type "help", "copyright", "credits", "license" or "spm.Api()" for more information.

Type "spm.DemoExtract(dirname = ...)" to extract demo scripts.

Please visit www.mbasciences.com for the latest and growing

collection of scripts and technical briefs classified in terms of

parallel management patterns.

l >>> import pooll >>> import demol >>> import os;l >>> taskApiArgs = \l dict(app = os.getcwd() + ’/hello_world’,l appOptions = "-prefix=’app’",l );l >>> taskTimeout = spm.util.timeout.after(seconds = 10);3 >>> demo.main(pool = pool.intraAll(),l taskApiArgs = taskApiArgs,l taskTimeout = taskTimeout)l #: MetaStatus (hub): Waiting - ForSpokes ...l #: MetaStatus (hub): Tasks - Evall app => 0l app => 1l #: MetaStatus (hub): Tasks - EvalDone3 >>> demo.main(pool = pool.intraOnePerServer(),l taskApiArgs = taskApiArgs,l taskTimeout = taskTimeout)l #: MetaStatus (hub): Waiting - ForSpokes ...l #: MetaStatus (hub): Tasks - Evall #: MetaStatus (hub): Tasks - EvalDonel >>> exit()l GNU/Linux []

Problem Statement

... exploiting parallelism using parallel primitives

frameworks

libraries

Clone

CloneRepeat

PartitionAggregate

Decentralized

PartitionAggregate

Centralized

PartitionList

PartitionDAG

{● Single, self-contained parallel environment● Patented Technology ...

Enable any OpenMPI application to inherit support for:

● Fault tolerance● Timeout● Detection of deadlocks

Partition/OpenMPI

Suites of parallel primitives to process data and graphsin parallel

Partition/HybridFlow

Partition/HybridFlow: Basic Template

while (not done):

try:

for work in pc.generate(...):

eval(work); # Local Python/C/C++/GPU computation

pc.counter.async += 1; # Update parallel data structure(s)

if (some condition):

raise pc.exception(...); # Parallel exception

if (some condition):

pc.emit(...); # Emit work/report

done = True;

except (pc.exception,) (val,):

if (some condition):

continue; # Repeat with new consensus (’val’)

done = True;

Partition/HybridFlow: Basic Template

while (not done):

try:

for work in pc.generate(...):

eval(work); # Local Python/C/C++/GPU computation

pc.counter.async += 1; # Update parallel data structure(s)

if (some condition):

raise pc.exception(...); # Parallel exception

if (some condition):

pc.emit(...); # Emit work/report

done = True;

except (pc.exception,) (val,):

if (some condition):

continue; # Repeat with new consensus (’val’)

done = True;

Partition/HybridFlow: Basic Template

while (not done):

try:

for work in pc.generate(...):

eval(work); # Local Python/C/C++/GPU computation

pc.counter.async += 1; # Update parallel data structure(s)

if (some condition):

raise pc.exception(...); # Parallel exception

if (some condition):

pc.emit(...); # Emit work/report

done = True;

except (pc.exception,) (val,):

if (some condition):

continue; # Repeat with new consensus (’val’)

done = True;

Partition/HybridFlow: Basic Template

while (not done):

try:

for work in pc.generate(...):

eval(work); # Local Python/C/C++/GPU computation

pc.counter.async += 1; # Update parallel data structure(s)

if (some condition):

raise pc.exception(...); # Parallel exception

if (some condition):

pc.emit(...); # Emit work/report

done = True;

except (pc.exception,) (val,):

if (some condition):

continue; # Repeat with new consensus (’val’)

done = True;

Partition/HybridFlow: Basic Template

while (not done):

try:

for work in pc.generate(...):

eval(work); # Local Python/C/C++/GPU computation

pc.counter.async += 1; # Update parallel data structure(s)

if (some condition):

raise pc.exception(...); # Parallel exception

if (some condition):

pc.emit(...); # Emit work/report

done = True;

except (pc.exception,) (val,):

if (some condition):

continue; # Repeat with new consensus (’val’)

done = True;

Partition/HybridFlow: Suite of Parallel Primitives

while (not done):try:

for work in pc.generate(...):eval(work);pc.counter.async += 1;if (some condition):

raise pc.exception(...);if (some condition):

pc.emit(...);

done = True;except (pc.exception,) (val,):

if (some condition):continue;

done = True;

Partition/HybridFlow: Suite of Parallel Primitives

while (not done):try:

for work in pc.generate(...):eval(work);pc.counter.async += 1;if (some condition):

raise pc.exception(...);if (some condition):

pc.emit(...);

done = True;except (pc.exception,) (val,):

if (some condition):continue;

done = True;

pc.generator(...);

pc.emit(...);pc.exception(...);

pc.counter.async;

Partition/HybridFlow: Suite of Parallel Primitives

while (not done):try:

for work in pc.generate(...):eval(work);pc.counter.async += 1;if (some condition):

raise pc.exception(...);if (some condition):

pc.emit(...);

done = True;except (pc.exception,) (val,):

if (some condition):continue;

done = True;

pc.generator(...);

pc.emit(...);pc.exception(...);

pc.counter.async;

BAP

Partition/HybridFlow: Suite of Parallel Primitives

while (not done):try:

for work in pc.generate(...):eval(work);pc.counter.async += 1;if (some condition):

raise pc.exception(...);if (some condition):

pc.emit(...);

done = True;except (pc.exception,) (val,):

if (some condition):continue;

done = True;

pc.generator(...);

pc.emit(...);pc.exception(...);

pc.counter.async;

BAP BSPSpeculative

Partition/HybridFlow: Suite of Parallel Primitives

while (not done):try:

for work in pc.generate(...):eval(work);pc.counter.async += 1;if (some condition):

raise pc.exception(...);if (some condition):

pc.emit(...);

done = True;except (pc.exception,) (val,):

if (some condition):continue;

done = True;

pc.generator(...);

pc.emit(...);pc.exception(...);

pc.counter.async;

BAP BSPSpeculative BSP

Partition/HybridFlow: Suite of Parallel Primitives

while (not done):try:

for work in pc.generate(...):eval(work);pc.counter.async += 1;if (some condition):

raise pc.exception(...);if (some condition):

pc.emit(...);

done = True;except (pc.exception,) (val,):

if (some condition):continue;

done = True;

pc.generator(...);

pc.emit(...);pc.exception(...);

pc.counter.async;

BAP BSPSpeculative BSP DAG

Partition/HybridFlow: Suite of Parallel Primitives

while (not done):try:

for work in pc.generate(...):eval(work);pc.counter.async += 1;if (some condition):

raise pc.exception(...);if (some condition):

pc.emit(...);

done = True;except (pc.exception,) (val,):

if (some condition):continue;

done = True;

pc.generator(...);

pc.emit(...);pc.exception(...);

pc.counter.async;

BAP BSPSpeculative BSP DAG

● ● ●

Conclusion

... exploiting parallelism using parallel primitives

frameworks

libraries

Clone

CloneRepeat

PartitionAggregate

Decentralized

PartitionAggregate

Centralized

PartitionList

PartitionDAG

{● Single, self-contained parallel environment● Patented Technology ...

Enable any OpenMPI application to inherit support for:

● Fault tolerance● Timeout● Detection of deadlocks

Partition/OpenMPI

Suites of parallel primitives to process data and graphsin parallel

Partition/HybridFlow

Conclusion (Cont’d)

http://www.mbasciences.com

⎧⎪⎪⎪⎨⎪⎪⎪⎩

SPM.Python distribution

Technical Briefs

Parallel Management Patterns

⎫⎪⎪⎪⎬⎪⎪⎪⎭

CloneOnceRepeat

PartitionDAGList

PartitionAggregateCentralizedDecentralized

Elementary

Parallel Primitives

PartitionGrid/OpenMPI

In

Limited Beta

HPC

Parallel Primitives

PartitionData FlowGraph

Limited Beta

Jan 24, 2012

Data / Graph

Parallel Primitives