1 March 17, 2006Zhiyi’s RSL VODCA: View-Oriented, Distributed, Cluster-based Approach to parallel...

23
March 17, 2006 Zhiyi’s RSL 1 VODCA: View-Oriented, Distributed, Cluster- based Approach to parallel computing Dr Zhiyi Huang Dept of Computer Science University of Otago New Zealand

description

3 March 17, 2006Zhiyi’s RSL VOPP VODCA is a system supporting View- Oriented Parallel Programming (VOPP) VODCA is a system supporting View- Oriented Parallel Programming (VOPP) Why a new programming style? Why a new programming style? Improve the performance of DSM applications on cluster computers Improve the performance of DSM applications on cluster computers Provide a programming style better than MPI Provide a programming style better than MPI Message passing is notoriously known as a difficult programming style Message passing is notoriously known as a difficult programming style VODCA is a system supporting View- Oriented Parallel Programming (VOPP) VODCA is a system supporting View- Oriented Parallel Programming (VOPP) Why a new programming style? Why a new programming style? Improve the performance of DSM applications on cluster computers Improve the performance of DSM applications on cluster computers Provide a programming style better than MPI Provide a programming style better than MPI Message passing is notoriously known as a difficult programming style Message passing is notoriously known as a difficult programming style

Transcript of 1 March 17, 2006Zhiyi’s RSL VODCA: View-Oriented, Distributed, Cluster-based Approach to parallel...

Page 1: 1 March 17, 2006Zhiyi’s RSL VODCA: View-Oriented, Distributed, Cluster-based Approach to parallel computing Dr Zhiyi Huang Dept of Computer Science University.

March 17, 2006 Zhiyi’s RSL 1

VODCA: View-Oriented, Distributed, Cluster-based

Approach to parallel computing

Dr Zhiyi HuangDept of Computer Science

University of OtagoNew Zealand

Page 2: 1 March 17, 2006Zhiyi’s RSL VODCA: View-Oriented, Distributed, Cluster-based Approach to parallel computing Dr Zhiyi Huang Dept of Computer Science University.

March 17, 2006 Zhiyi’s RSL 2

Motivation

DSM applications are not as efficient as DSM applications are not as efficient as MPI on cluster computersMPI on cluster computers

0

5

10

15

20

25

2-p 4-p 8-p 16-p 32-p

TMKMPI

Page 3: 1 March 17, 2006Zhiyi’s RSL VODCA: View-Oriented, Distributed, Cluster-based Approach to parallel computing Dr Zhiyi Huang Dept of Computer Science University.

March 17, 2006 Zhiyi’s RSL 3

VOPP

VODCA is a system supporting View-VODCA is a system supporting View-Oriented Parallel Programming (VOPP)Oriented Parallel Programming (VOPP)

Why a new programming style?Why a new programming style? Improve the performance of DSM Improve the performance of DSM

applications on cluster computersapplications on cluster computers Provide a programming style better than Provide a programming style better than

MPIMPIMessage passing is notoriously known as a Message passing is notoriously known as a

difficult programming styledifficult programming style

Page 4: 1 March 17, 2006Zhiyi’s RSL VODCA: View-Oriented, Distributed, Cluster-based Approach to parallel computing Dr Zhiyi Huang Dept of Computer Science University.

March 17, 2006 Zhiyi’s RSL 4

What is a view?

Suppose Suppose MM is the set of data objects in is the set of data objects in shared memoryshared memory

A view is a group of data objects from the A view is a group of data objects from the shared memoryshared memory V, VV, VMM

Views must not overlap each otherViews must not overlap each other Vi, Vj, i Vi, Vj, i j, Vi j, Vi Vj = Vj =

Suppose there are Suppose there are nn views in shared memory views in shared memory ∑ ∑ Vi=MVi=M

Page 5: 1 March 17, 2006Zhiyi’s RSL VODCA: View-Oriented, Distributed, Cluster-based Approach to parallel computing Dr Zhiyi Huang Dept of Computer Science University.

March 17, 2006 Zhiyi’s RSL 5

VOPP Requirements

The programmer The programmer shouldshould divide the shared divide the shared data into a number of views according to the data into a number of views according to the data flowdata flow of the of the parallel parallel algorithmalgorithm..

A view should consist of data objects that A view should consist of data objects that are are always processed as an atomic set in a always processed as an atomic set in a program.program.

Views can be created and destroyed anytime.Views can be created and destroyed anytime. Each view has a unique view identifierEach view has a unique view identifier

Page 6: 1 March 17, 2006Zhiyi’s RSL VODCA: View-Oriented, Distributed, Cluster-based Approach to parallel computing Dr Zhiyi Huang Dept of Computer Science University.

March 17, 2006 Zhiyi’s RSL 6

VOPP Requirements (cont.)

View primitives View primitives such as such as acquire_viewacquire_view and and release_viewrelease_view must be used when a must be used when a view is accessed.view is accessed.

acquire_view(View_A);acquire_view(View_A);A = A + 1;A = A + 1;

release_view(View_A);release_view(View_A);acquire_acquire_RRviewview and and release_release_RRviewview can can

be used when a view is only read by a be used when a view is only read by a processor.processor.

Page 7: 1 March 17, 2006Zhiyi’s RSL VODCA: View-Oriented, Distributed, Cluster-based Approach to parallel computing Dr Zhiyi Huang Dept of Computer Science University.

March 17, 2006 Zhiyi’s RSL 7

Example

A VOPP program for a A VOPP program for a producer/consumer problemproducer/consumer problem

If(prod_id == 0){ acquire_view(1); produce(x); release_view(1);}barrier(0);acquire_Rview(1);consume(x);release_Rview(1);

Page 8: 1 March 17, 2006Zhiyi’s RSL VODCA: View-Oriented, Distributed, Cluster-based Approach to parallel computing Dr Zhiyi Huang Dept of Computer Science University.

March 17, 2006 Zhiyi’s RSL 8

Advantages of VOPP

Keep the convenience of shared memory Keep the convenience of shared memory programmingprogramming

Focus on data partitioning and data access Focus on data partitioning and data access instead of data race and mutual exclusioninstead of data race and mutual exclusion View primitives automatically achieve mutual View primitives automatically achieve mutual

exclusionexclusion View primitives are not extra burdenView primitives are not extra burden

The programmer can finely tune the parallel The programmer can finely tune the parallel algorithm by careful view partitioningalgorithm by careful view partitioning

Page 9: 1 March 17, 2006Zhiyi’s RSL VODCA: View-Oriented, Distributed, Cluster-based Approach to parallel computing Dr Zhiyi Huang Dept of Computer Science University.

March 17, 2006 Zhiyi’s RSL 9

Philosophy of VOPP

Shared memory is a critical resource Shared memory is a critical resource that needs to be used with carethat needs to be used with care If there is no need to use shared memory, If there is no need to use shared memory,

don’t use itdon’t use it Justification is wanted before a view is Justification is wanted before a view is

createdcreated

Page 10: 1 March 17, 2006Zhiyi’s RSL VODCA: View-Oriented, Distributed, Cluster-based Approach to parallel computing Dr Zhiyi Huang Dept of Computer Science University.

March 17, 2006 Zhiyi’s RSL 10

VOPP vs. MPI

Easier for programmers than MPIEasier for programmers than MPI For problems like task queue, programming with For problems like task queue, programming with

MPI is horrific.MPI is horrific. Can mimic any finely-tuned MPI programCan mimic any finely-tuned MPI program

Shared message Shared message view view Send/recv Send/recv acquire_view acquire_view

Essential differencesEssential differences View is location transparentView is location transparent More barriers in VOPPMore barriers in VOPP

Page 11: 1 March 17, 2006Zhiyi’s RSL VODCA: View-Oriented, Distributed, Cluster-based Approach to parallel computing Dr Zhiyi Huang Dept of Computer Science University.

March 17, 2006 Zhiyi’s RSL 11

Implementation

VODCA: View-Oriented, Distributed, VODCA: View-Oriented, Distributed, Cluster-based Approach to parallel Cluster-based Approach to parallel computingcomputing

VODCA version 1.0VODCA version 1.0 Released as an open source softwareReleased as an open source software A library run at the user spaceA library run at the user space Based on View-based ConsistencyBased on View-based Consistency Use an efficient consistency protocol Use an efficient consistency protocol

VOUPIDVOUPID

Page 12: 1 March 17, 2006Zhiyi’s RSL VODCA: View-Oriented, Distributed, Cluster-based Approach to parallel computing Dr Zhiyi Huang Dept of Computer Science University.

March 17, 2006 Zhiyi’s RSL 12

View-based Consistency

Condition for View-based Consistency Before a processor Pi is allowed to access a view

by calling acquire_view or acquire_Rview, all previous write accesses to data objects of the view must be performed with respect to Pi according to their causal order.

In VOPP, barriers are only used for synchronization and have nothing to do with consistency maintenance for DSM.

Page 13: 1 March 17, 2006Zhiyi’s RSL VODCA: View-Oriented, Distributed, Cluster-based Approach to parallel computing Dr Zhiyi Huang Dept of Computer Science University.

March 17, 2006 Zhiyi’s RSL 13

Consistency protocols

They are page basedThey are page basedUpdate protocolUpdate protocol

Modify immediatelyModify immediately Invalidation protocolInvalidation protocol

Use a write notice to invalidate a pageUse a write notice to invalidate a page When the page is accessed, a page fault When the page is accessed, a page fault

causes the fetch of diffs which are applied causes the fetch of diffs which are applied on the pageon the page

Page 14: 1 March 17, 2006Zhiyi’s RSL VODCA: View-Oriented, Distributed, Cluster-based Approach to parallel computing Dr Zhiyi Huang Dept of Computer Science University.

March 17, 2006 Zhiyi’s RSL 14

Consistency protocols (cont.)

Home-based protocolHome-based protocol Based on invalidate protocol, butBased on invalidate protocol, but For each page, use a copy as its homeFor each page, use a copy as its home When a diff is created, it is applied to the When a diff is created, it is applied to the

home copy immediatelyhome copy immediately When the page is accessed, a page fault When the page is accessed, a page fault

causes the fetch of the home copy (Pros: causes the fetch of the home copy (Pros: resolve the diff accumulation problem)resolve the diff accumulation problem)

Page 15: 1 March 17, 2006Zhiyi’s RSL VODCA: View-Oriented, Distributed, Cluster-based Approach to parallel computing Dr Zhiyi Huang Dept of Computer Science University.

March 17, 2006 Zhiyi’s RSL 15

The VOUPID protocol

View-Oriented Update Protocol with View-Oriented Update Protocol with Integrated DiffIntegrated Diff BBased on the update protocolased on the update protocol DDiffs of a page of a view are merged into a iffs of a page of a view are merged into a

single diffsingle diff TThe single diff is used to update the page he single diff is used to update the page

when the view is acquiredwhen the view is acquired

Page 16: 1 March 17, 2006Zhiyi’s RSL VODCA: View-Oriented, Distributed, Cluster-based Approach to parallel computing Dr Zhiyi Huang Dept of Computer Science University.

March 17, 2006 Zhiyi’s RSL 16

Experiment

Use a cluster computerUse a cluster computer TheThe cluster computer, cluster computer, in Tsinghua Univ.in Tsinghua Univ., consists , consists

of of 128 Itanium 2 128 Itanium 2 running Linux 2.4, connected by running Linux 2.4, connected by InfiniBandInfiniBand. Each . Each nodenode has has two 1.3 GHztwo 1.3 GHz processorprocessorss and and 4 G4 Gbytes RAM. We run two bytes RAM. We run two processes on each node.processes on each node.

We used four applications, Integer Sort (IS), We used four applications, Integer Sort (IS), Gauss, Successive Over-Relaxation (SOR), Gauss, Successive Over-Relaxation (SOR), and Neural Network (NN).and Neural Network (NN).

Page 17: 1 March 17, 2006Zhiyi’s RSL VODCA: View-Oriented, Distributed, Cluster-based Approach to parallel computing Dr Zhiyi Huang Dept of Computer Science University.

March 17, 2006 Zhiyi’s RSL 17

Related systems

TreadMarks (TMK) is a state-of-the-art TreadMarks (TMK) is a state-of-the-art Distributed Shared Memory system Distributed Shared Memory system based on traditional parallel based on traditional parallel programming.programming.

Message Passing Interface (MPI) is a Message Passing Interface (MPI) is a standard for message passing-based standard for message passing-based parallel programming. We used parallel programming. We used LAM/MPI.LAM/MPI.

Page 18: 1 March 17, 2006Zhiyi’s RSL VODCA: View-Oriented, Distributed, Cluster-based Approach to parallel computing Dr Zhiyi Huang Dept of Computer Science University.

March 17, 2006 Zhiyi’s RSL 18

Performance of NN

0

5

10

15

20

25

30

35

2-p 4-p 8-p 16-p 32-p

VODCATMKMPI

Page 19: 1 March 17, 2006Zhiyi’s RSL VODCA: View-Oriented, Distributed, Cluster-based Approach to parallel computing Dr Zhiyi Huang Dept of Computer Science University.

March 17, 2006 Zhiyi’s RSL 19

Performance of IS

0

5

10

15

20

25

2-p 4-p 8-p 16-p 32-p

VODCATMKMPI

Page 20: 1 March 17, 2006Zhiyi’s RSL VODCA: View-Oriented, Distributed, Cluster-based Approach to parallel computing Dr Zhiyi Huang Dept of Computer Science University.

March 17, 2006 Zhiyi’s RSL 20

Performance of SOR

0

2

4

6

8

10

12

14

16

2-p 4-p 8-p 16-p 32-p

VODCATMKMPI

Page 21: 1 March 17, 2006Zhiyi’s RSL VODCA: View-Oriented, Distributed, Cluster-based Approach to parallel computing Dr Zhiyi Huang Dept of Computer Science University.

March 17, 2006 Zhiyi’s RSL 21

Performance of Gauss

0

5

10

15

20

25

2-p 4-p 8-p 16-p 32-p

VODCATMKMPI

Page 22: 1 March 17, 2006Zhiyi’s RSL VODCA: View-Oriented, Distributed, Cluster-based Approach to parallel computing Dr Zhiyi Huang Dept of Computer Science University.

March 17, 2006 Zhiyi’s RSL 22

Future work on VOPP

More benchmarks/applicationsMore benchmarks/applications Performance evaluationPerformance evaluation on larger clusters on larger clusters Optimized implementation of barriers for Optimized implementation of barriers for

VOPPVOPP More auxiliary utilitiesMore auxiliary utilities for for VOPP VOPP programmersprogrammers A view-based debugger for VOPPA view-based debugger for VOPP A fault-tolerant system for VODCAA fault-tolerant system for VODCA

Page 23: 1 March 17, 2006Zhiyi’s RSL VODCA: View-Oriented, Distributed, Cluster-based Approach to parallel computing Dr Zhiyi Huang Dept of Computer Science University.

March 17, 2006 Zhiyi’s RSL 23

Questions?