In-class slides with activities

Parallel Algorithms

Sortingand more

Keep hardware in mind

• When considering ‘parallel’ algorithms,– We have to have an understanding of the

hardware they will run on

– Sequential algorithms: we are doing this implicitly

Creative use of processing power

• Lots of data = need for speed• ~20 years : parallel processing– Studying how to use multiple processors together– Really large and complex computations– Parallel processing was an active sub-field of CS

• Since 2005: the era of multicore is here– All computers will have >1 processing unit

Traditional Computing Machine

• Von Neumann model:– The stored program computer

• What is this?– Abstractly, what does it look like?

New twist: multiple control units

• It’s difficult to make the CPU any faster– To increase potential speed, add more CPUs– These CPUs are called cores

• Abstractly, what might this look like in these new machines?

Shared memory model

• Multiple processors can access memory locations

• May not scale over time– As we increase the ‘cores’

Other ‘parallel’ configurations:• Clusters of computers– Network connects them

Other ‘parallel’ configurations• Massive data centers

Clusters and data centers• Distributed memory model

Algorithms• We will use term processor for the processing unit that

executes instructions

• When considering how to design algorithms for these architectures– Useful to start with a base theoretical model– Revise when implementing on different hardware with

software packages• Parallel computing course

– Also consider:• Memory location access by ‘competing’/’cooperating’ processors• Theoretical arrangement of the processors

PRAM model

• Parallel Random Access Machine• Theoretical

• Abstractly, what does it look like?• How do processors access memory in this

PRAM model?

PRAM model

• Why is using the PRAM model useful when studying algorithms?

PRAM model

• Processors working in parallel– Each trying to access memory values– Memory value: what do we mean by this?

• When designing algorithms, we need to consider what type of memory access that algorithm requires

• How might our theoretical computer work when many reads and writes are happening at the same time?

Designing algorithms

• With many algorithms, we’re moving data around– Sort, e.g. Others?

• Concurrent reads by multiple processors– Memory not changed, so no ‘conflicts’

• Exclusive writes (EW)– Design pseudocode so that any processor is exclusively

writing a data value into a memory location

Designing Algorithms• Arranging the processors– Helpful for design of algorithm

• We can envision how it works• We can envision the data access pattern needed

– EREW, CREW (CRCW)

– Not how processors are necessarily arranged in practice• Although some machines have been

– What are some possible arrangements?– Why might these arrangements prove useful for design?

Arrangements

Sorting in Parallel

Emphasis: merge sort

Sequential merge sort

• Recursive– Can envision

a recursion tree

function mergesort(m) var list left, right if length(m) ≤ 1 return m else middle = length(m) / 2

for each x in m up to middle add x to left

for each x in m after middle add x to right

left = mergesort(left)

right = mergesort(right)

result = merge(left, right)

return result

Parallel merge sort• Shared data: 2 lists in memory• Sort pairs once in parallel• The processes merge concurrently

How might we write the pseudocode?

Parallel merge sort• Shared data: 2 lists in memory• Sort pairs once in parallel• The processes merge concurrently

How might we write the pseudocode?

Numbering of processors starts with 0

s = 2while s <= N do in parallel N/s steps for proc i merge values from i*s to (s*i)+s -1 s = s*2

Parallel Merge Sort

• Work through pseudocode with larger N

• Processor Arrangement: binary tree• Memory access: EREW

• What was the more practical implementation?

Let’s try others

Different from sorting

Activity: Sum N integers

• Suppose we have an array of N integers in memory

• We wish to sum them– Variant: create a running sum in a new array

• Devise a parallel algorithm for this– Assume PRAM to start– What processor arrangement did you use?– What memory access is required?

Next Activity• Now suppose you need an algorithm for

multiplying a matrix by a vector

X =

Matrix A Vector X Result Vector

Devise a parallel algorithm for thisAssume PRAM to start Think about what each process will compute- there are optionsWhat processor arrangement did you use?What memory access is required?

Matrix-Vector Multiplication• The matrix is assumed to be M x N. In other words:

– The matrix has M rows.– The matrix has N columns.– For example, a 3 x 2 matrix has 3 rows and 2 columns.

• In matrix-vector multiplication, if the matrix is M x N, then the vector must have a dimension, N.– In other words, the vector will have N entries.– If the matrix is 3 x 2, then the vector must be 3 dimensional.– This is usually stated as saying the matrix and vector must be

conformable.• Then, if the matrix and vector are conformable, the product of the matrix and the vector is a resultant vector that has a dimension of M.

(So, the result could be a different size than the original vector!)For example, if the matrix is 3 x 2, and the vector is 3 dimensional, the result of the multiplication would be a vector of 2 dimensions

Matrix-Vector Multiplication

• Ways to do a parallel algorithm:– One row of matrix per processor– One element of matrix per processor• There is additional overhead involved why?

• What if number of rows M is larger than number of processors?

• Emerging theme: how to partition the data

Expand on previous example

• Matrix – Matrix multiplication

=

X= ?

In-class slides with activities

Documents

Transcript of In-class slides with activities