CSE5304—Project Proposal Parallel Matrix Multiplication

Post on 03-Jan-2016

30 views 3 download

description

CSE5304—Project Proposal Parallel Matrix Multiplication. Tian Mi. An naive version with MPI. Result:. P 1 . P 2 . …. P i . …. P N . An naive version with MPI. P i .  P i. An naive version with MPI. Processor0 reads input file Processor0 distributes one matrix - PowerPoint PPT Presentation

Transcript of CSE5304—Project Proposal Parallel Matrix Multiplication

CSE5304—Project Proposal

Parallel Matrix Multiplication

Tian Mi

An naive version with MPI

P1

P2

Pi

PN

Result:

An naive version with MPI

Pi Pi

An naive version with MPI

Processor0 reads input fileProcessor0 distributes one matrixProcessor0 broadcasts the other matrixAll processors in parallel

Do the multiplication of each piece of data

Processor0 gathers the resultProcessor0 writes result to output file

MPI_Scatter

MPI_Scatter

MPI_Bcast

MPI_Bcast

MPI_Gather

MPI_Gather

Data generation

Data generation in R with package “igraph”

Integer in range of [-1000, 1000]Matrix size:

Matrix 512*512 1024*1024 2048*2048 4096*4096

File size 2.69 MB 10.7 MB 43.1 MB 172 MB

Result

Data size: 1024*1024# Processors Experiments(second) Average(s) Speedup

1 44 41 45 37 42 41.8 1

2 23 20 21 19 22 21 1.99

4 11 10 19 18 16 14.8 2.82

8 10 9 8 9 10 9.2 4.54

16 9 9 11 9 6 8.8 4.75

32 8 10 8 7 7 8 5.23

64 8 8 8 8 8 8 5.23

128 10 9 6 8 9 8.4 4.98

Result

Data size: 1024*1024

05

1015202530354045

1 2 4 8 16 32 64 128

# processors

time

(s)

Result

Data size: 1024*1024

0

1

2

3

4

5

6

1 2 4 8 16 32 64 128

# processors

spee

dup

Result

Data size: 2048*2048

# Processors Time(s) Speedup

1 751 1

2 498 1.508032

4 258 2.910853

8 127 5.913386

16 84 8.940476

32 51 14.72549

64 55 13.65455

128 48 15.64583

Result

Data size: 2048*2048

0100200300400

500600700800

1 2 4 8 16 32 64 128

# processors

time

(s)

Result

Data size: 2048*2048

02468

1012141618

1 2 4 8 16 32 64 128

# processors

spee

dup

Result

Data size: 4096*4096

# Processors Time(s) Speedup

1 5920 1

2 3630 1.630854

4 2813 2.104515

8 925 6.4

16 745 7.946309

32 576 10.27778

64 #DIV/0!

128 #DIV/0!

Analysis

To see the superlinear speedup increase the computation, which is not dominan

t enough larger matrix and larger integer

However, larger matrix or long integer will also increase the communication time (broadcast, scatter, gather)

Cannon's algorithm--Example

http://www.vampire.vanderbilt.edu/education-outreach/me343_fall2008/notes/parallelMM_10_09.pdf

Cannon's algorithm

Still Implementing and debuggingNo result to share at present

Thank you

Questions & Comments?