Stefan Amberger - Johannes Kepler University Linz

28
A Parallel, In-Place, Rectangular Matrix Transpose Algorithm Description of Algorithm and Correctness Proof Stefan Amberger ICA & RISC [email protected]

Transcript of Stefan Amberger - Johannes Kepler University Linz

Page 1: Stefan Amberger - Johannes Kepler University Linz

A Parallel, In-Place, Rectangular Matrix Transpose Algorithm

Description of Algorithm and Correctness Proof

Stefan Amberger

ICA & [email protected]

Page 2: Stefan Amberger - Johannes Kepler University Linz

Table of Contents

1. Introduction

2. Description of Transpose Algorithm

3. Proof of Correctness

2

Page 3: Stefan Amberger - Johannes Kepler University Linz

Introduction

3

Page 4: Stefan Amberger - Johannes Kepler University Linz

Rectangular Matrices

Large rectangular matrices are abundant

● Discrete Fourier transforms

● Finite element method

● Raster images in earth observation

● Computer graphics (e.g. radiosity equation)

● etc.

4

Page 5: Stefan Amberger - Johannes Kepler University Linz

Moore’s Law

● Number of transistors on a chip doubles every

two years

● Maximum clock frequencies reached in 2005

● Maximum power density reached

→ multiple cores on CPUs

5

Current Situation in Computing

Memory

often the limiting factor

● medium-sized problems on mobile /

embedded device

● large problem on computer

Example:

100.000 x 100.000 matrix: 75 GB

Need parallel, in-place algorithms

Page 6: Stefan Amberger - Johannes Kepler University Linz

Concept of Transpose

two-dimensional

Implementation on Computer

one-dimensional

6

Rectangular Matrix TransposeMathematical Concept vs Implementation

Page 7: Stefan Amberger - Johannes Kepler University Linz

In-Place Transpose of Square Matrix

using one temporary variable

M x (M-1)/2 permutation cycles

In-Place Transpose of Rectangular

Matrix

one-dimensional

, like every permutation, can be decomposed into

disjoint, independent cycles

7

Rectangular Matrix TransposeIn-Place Transpose

Page 8: Stefan Amberger - Johannes Kepler University Linz

Common Approach

Independence of Permutation Cycles

● Limited Parallelism

● Problem-dependent parallelism

● Permutation cycles are inherently serial

8

Rectangular Matrix TransposeParallel In-Place Transpose

Our Approach

Divide and conquer

Transpose of Rectangular matrices, In-place and in

Parallel (TRIP)

● Highly parallel for all problem-sizes (see

presentation 2)

● In-place

● Recursive

Page 9: Stefan Amberger - Johannes Kepler University Linz

9

Description of Transpose Algorithm

Page 10: Stefan Amberger - Johannes Kepler University Linz

TRIP Algorithm

If matrix is rectangular TRIP transposes sub-matrices, then combines the result with

merge or split

10

Page 11: Stefan Amberger - Johannes Kepler University Linz

original matrix is tall → it is divided by TRIP

and the sub-matrices are in-place transposed

11

TRIP ExampleTranspose of a Tall Matrix

Page 12: Stefan Amberger - Johannes Kepler University Linz

TRIP Example the transposed sub-matrices are

combined by merge

12

Transpose of a Tall Matrix

Page 13: Stefan Amberger - Johannes Kepler University Linz

Transpose of a Tall Matrix

TRIP Example the merged result can be reinterpreted

as the transpose of the original matrix

13

Page 14: Stefan Amberger - Johannes Kepler University Linz

merge combines the transposes of sub-matrices of tall matrices

merge first rotates the middle part of the array, then recursively merges the left and right

parts of the array

rol(arr, k) … left rotation (circular shift) of array arr by k elements14

merge Algorithm

Page 15: Stefan Amberger - Johannes Kepler University Linz

split combines the transposes of sub-matrices of wide matrices

split first recursively splits the left and right parts of the array, then rotates the middle

part of the array

split and merge are inverse to each other15

split Algorithm

Page 16: Stefan Amberger - Johannes Kepler University Linz

Correctness Proof

16

Page 17: Stefan Amberger - Johannes Kepler University Linz

Correctness of merge

Matrix is split into two parts → transpose of Matrix is split into two parts

17

Structure of Matrix and Transpose

Page 18: Stefan Amberger - Johannes Kepler University Linz

Correctness of merge

In-place transposition of sub-matrices results in reshaped transposes of sub-matrices

18

Structure of Matrix after In-Place Transposition of Sub-Matrices

Page 19: Stefan Amberger - Johannes Kepler University Linz

Correctness of merge

Prove by induction: merge transforms T into the transpose of A

19

Proof Sketch

merge

Page 20: Stefan Amberger - Johannes Kepler University Linz

20

Correctness of mergeLemma (merge)

Page 21: Stefan Amberger - Johannes Kepler University Linz

Base Case (k=1)

21

Correctness of mergeProof of Lemma (merge) !

Induction Hypothesis (k0 a.b.f)

Page 22: Stefan Amberger - Johannes Kepler University Linz

Induction Step (k0 → k0+1)

22

Correctness of mergeProof of Lemma (merge) !

merge matches recursive case

rol transforms T to

Page 23: Stefan Amberger - Johannes Kepler University Linz

finally: recursive merge calls on sub-arrays

23

Correctness of mergeProof of Lemma (merge) !

I.H.

Page 24: Stefan Amberger - Johannes Kepler University Linz

analogous, by induction

24

Correctness of split

Page 25: Stefan Amberger - Johannes Kepler University Linz

Induction on number of elements of matrix

Base Case (E=1):

25

Correctness of TRIPProof by Induction

Induction Hypothesis (E0 a.b.f.):

Page 26: Stefan Amberger - Johannes Kepler University Linz

Induction Step (E0 → E0+1):

26

Correctness of TRIPProof by Induction

Matrix is divided in two sub-matrices of dimension and

with and

Induction hypothesis applies, merge combines result.

Matrix is divided in two sub-matrices of dimension and

with and

Induction hypothesis applies, split combines result.

Page 27: Stefan Amberger - Johannes Kepler University Linz

Conclusions

27

Page 28: Stefan Amberger - Johannes Kepler University Linz

Novel Algorithm TRIP transposes rectangular matrices

● correctly

● in-place

● in highly parallel manner (see next presentation)

28

Conclusions