Parallel and Multiprocessor Architectures

19
Chapter 9.4 By Eric Neto

description

Parallel and Multiprocessor Architectures. Chapter 9.4. By Eric Neto. Parallel & Multiprocessor Architecture. In making processors faster, we run into certain limitations. Physical Economic Solution: When necessary, use more processors, working in sync. - PowerPoint PPT Presentation

Transcript of Parallel and Multiprocessor Architectures

Page 1: Parallel and Multiprocessor Architectures

Chapter 9.4

By Eric Neto

Page 2: Parallel and Multiprocessor Architectures

Parallel & Multiprocessor ArchitectureIn making processors faster, we run into

certain limitations.PhysicalEconomic

Solution: When necessary, use more processors, working in sync.

Page 3: Parallel and Multiprocessor Architectures

Parallel & Multiprocessor LimitationsThough parallel processing can speed up

performance, the amount is limited.Intuitively, you’d expect N processors to do

the work in 1/N time, but processes sometimes work in sequences, so there will be some downtime while dormant processors wait for the active processor to finish.

Therefore, the more sequential a process is, the less cost-effective it is to implement parallelism.

Page 4: Parallel and Multiprocessor Architectures

Parallel and Multiprocessing ArchitecturesSuperscalarVLIWVectorInterconnection NetworksShared MemoryDistributed Computing

Page 5: Parallel and Multiprocessor Architectures

Superscalar ArchitectureAllow multiple instructions to be executed

simultaneously in each cycle.Contain

Execution units – Each allows for one process to execute.

Specialized instruction fetch unit – Fetch multiple instructions at once, send them to decoding unit.

Decoding unit – Determines whether the given instructions are independent of one another.

Page 6: Parallel and Multiprocessor Architectures

VLIW ArchitectureSimilar to superscalar, but relies on compiler

rather than specific hardware.Puts independent instructions into one “Very

Long Instruction Words”Advantages:

More simple hardwareDisadvantages:

Instructions fixed at compile time, so some modifications could affect execution of instructions

Page 7: Parallel and Multiprocessor Architectures

Vector ProcessorsUse vector pipelines to store and perform

operations on many values at once, as opposed to Scalar processing, which only performs operations on individual values.

Since it uses fewer instructions, there is less decoding, control unit overhead, and memory bandwidth usage.

Can be SIMD or MIMD.

Xn = X1 + X2 ;

Yn = Y1 + Y2 ;

Zn = Z1 + Z2 ;

Wn = W1 + W2 ;

VS.LDV V1, R1LDV V2, R2ADDV R3, V1, V2STV R3, V3

Page 8: Parallel and Multiprocessor Architectures

Interconnection NetworksEach processor has it’s own memory, that can

be accessed and shared by other processors through an interconnected network.

Efficiency of messages shared through the network is limited based on:BandwidthMessage latencyTransport latencyOverhead

In general, the amount of messages sent and distances they must travel are minimized.

Page 9: Parallel and Multiprocessor Architectures

TopologiesConnections between networks can be either

static or dynamic.Different configurations of static processors

are more useful for different tasks.

Completely Connected

Ring

Star

Page 10: Parallel and Multiprocessor Architectures

More Topologies

Tree Mesh

Hypercube

Page 11: Parallel and Multiprocessor Architectures

Dynamic NetworksBusses, Crossbars, Switches, Multistage

connections.As you implement more processors, these get

exponentially more expensive.

Page 12: Parallel and Multiprocessor Architectures

Dynamic Networking:Crossbar Network• Efficient• Direct• Expensive

Page 13: Parallel and Multiprocessor Architectures

Dynamic Networking:Switch Network• Complex• Moderately Efficient• Cheaper

Page 14: Parallel and Multiprocessor Architectures

SimpleSlowInefficientCheap

Dynamic Networking:Bus

Page 15: Parallel and Multiprocessor Architectures

Shared Memory MultiprocessorsMemory is shared

either globally or locally, or a combination of the two.

Page 16: Parallel and Multiprocessor Architectures

Shared Memory AccessUniform Memory Access systems use a

shared memory pool, where all memory takes the same amount of time to access.Quickly becomes expensive when more

processors are added.

Page 17: Parallel and Multiprocessor Architectures

Shared Memory AccessNon-Uniform Memory Access systems

have memory distributed across all the processors, and it takes less time for a processor to read from its own local memory than from non-local memory.Prone to cache coherence problems, which

occur when a local cache isn’t in sync with non-local caches representing the same data.

Dealing with these problems require extra mechanisms to ensure coherence.

Page 18: Parallel and Multiprocessor Architectures

Distributed ComputingMulti-Computer

processingWorks on the same

principal as multi-processors on a larger scale.

Uses a large network of computers to solve small parts of a very large problem.

Page 19: Parallel and Multiprocessor Architectures