In-class slides with activities
-
Upload
serc-at-carleton-college -
Category
Documents
-
view
244 -
download
0
description
Transcript of In-class slides with activities
Parallel Algorithms
Sortingand more
Keep hardware in mind
• When considering ‘parallel’ algorithms,– We have to have an understanding of the
hardware they will run on
– Sequential algorithms: we are doing this implicitly
Creative use of processing power
• Lots of data = need for speed• ~20 years : parallel processing– Studying how to use multiple processors together– Really large and complex computations– Parallel processing was an active sub-field of CS
• Since 2005: the era of multicore is here– All computers will have >1 processing unit
Traditional Computing Machine
• Von Neumann model:– The stored program computer
• What is this?– Abstractly, what does it look like?
New twist: multiple control units
• It’s difficult to make the CPU any faster– To increase potential speed, add more CPUs– These CPUs are called cores
• Abstractly, what might this look like in these new machines?
Shared memory model
• Multiple processors can access memory locations
• May not scale over time– As we increase the ‘cores’
Other ‘parallel’ configurations:• Clusters of computers– Network connects them
Other ‘parallel’ configurations• Massive data centers
Clusters and data centers• Distributed memory model
Algorithms• We will use term processor for the processing unit that
executes instructions
• When considering how to design algorithms for these architectures– Useful to start with a base theoretical model– Revise when implementing on different hardware with
software packages• Parallel computing course
– Also consider:• Memory location access by ‘competing’/’cooperating’ processors• Theoretical arrangement of the processors
PRAM model
• Parallel Random Access Machine• Theoretical
• Abstractly, what does it look like?• How do processors access memory in this
PRAM model?
PRAM model
• Why is using the PRAM model useful when studying algorithms?
PRAM model
• Processors working in parallel– Each trying to access memory values– Memory value: what do we mean by this?
• When designing algorithms, we need to consider what type of memory access that algorithm requires
• How might our theoretical computer work when many reads and writes are happening at the same time?
Designing algorithms
• With many algorithms, we’re moving data around– Sort, e.g. Others?
• Concurrent reads by multiple processors– Memory not changed, so no ‘conflicts’
• Exclusive writes (EW)– Design pseudocode so that any processor is exclusively
writing a data value into a memory location
Designing Algorithms• Arranging the processors– Helpful for design of algorithm
• We can envision how it works• We can envision the data access pattern needed
– EREW, CREW (CRCW)
– Not how processors are necessarily arranged in practice• Although some machines have been
– What are some possible arrangements?– Why might these arrangements prove useful for design?
Arrangements
Sorting in Parallel
Emphasis: merge sort
Sequential merge sort
• Recursive– Can envision
a recursion tree
function mergesort(m) var list left, right if length(m) ≤ 1 return m else middle = length(m) / 2
for each x in m up to middle add x to left
for each x in m after middle add x to right
left = mergesort(left)
right = mergesort(right)
result = merge(left, right)
return result
Parallel merge sort• Shared data: 2 lists in memory• Sort pairs once in parallel• The processes merge concurrently
How might we write the pseudocode?
Parallel merge sort• Shared data: 2 lists in memory• Sort pairs once in parallel• The processes merge concurrently
How might we write the pseudocode?
Numbering of processors starts with 0
s = 2while s <= N do in parallel N/s steps for proc i merge values from i*s to (s*i)+s -1 s = s*2
Parallel Merge Sort
• Work through pseudocode with larger N
• Processor Arrangement: binary tree• Memory access: EREW
• What was the more practical implementation?
Let’s try others
Different from sorting
Activity: Sum N integers
• Suppose we have an array of N integers in memory
• We wish to sum them– Variant: create a running sum in a new array
• Devise a parallel algorithm for this– Assume PRAM to start– What processor arrangement did you use?– What memory access is required?
Next Activity• Now suppose you need an algorithm for
multiplying a matrix by a vector
X =
Matrix A Vector X Result Vector
Devise a parallel algorithm for thisAssume PRAM to start Think about what each process will compute- there are optionsWhat processor arrangement did you use?What memory access is required?
Matrix-Vector Multiplication• The matrix is assumed to be M x N. In other words:
– The matrix has M rows.– The matrix has N columns.– For example, a 3 x 2 matrix has 3 rows and 2 columns.
• In matrix-vector multiplication, if the matrix is M x N, then the vector must have a dimension, N.– In other words, the vector will have N entries.– If the matrix is 3 x 2, then the vector must be 3 dimensional.– This is usually stated as saying the matrix and vector must be
conformable.• Then, if the matrix and vector are conformable, the product of the matrix and the vector is a resultant vector that has a dimension of M.
(So, the result could be a different size than the original vector!)For example, if the matrix is 3 x 2, and the vector is 3 dimensional, the result of the multiplication would be a vector of 2 dimensions
Matrix-Vector Multiplication
• Ways to do a parallel algorithm:– One row of matrix per processor– One element of matrix per processor• There is additional overhead involved why?
• What if number of rows M is larger than number of processors?
• Emerging theme: how to partition the data
Expand on previous example
• Matrix – Matrix multiplication
=
X= ?