Parallel Computer Architectures 2 nd week

42
-1- Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM Parallel Computer Architectures 2 nd week References Flynn’s Taxonomy Classification of Parallel Computers Based on Architectures Trend in Parallel Computer Architectures

description

Parallel Computer Architectures 2 nd week. References Flynn’s Taxonomy Classification of Parallel Computers Based on Architectures Trend in Parallel Computer Architectures. REFERENCES. Scalable Parallel Computing Kai Hwang and Zhiwei Xu, McGraw-Hill Chapter.1 - PowerPoint PPT Presentation

Transcript of Parallel Computer Architectures 2 nd week

Page 1: Parallel Computer Architectures 2 nd  week

-1- Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM

Parallel Computer Architectures

2nd weekReferencesFlynn’s Taxonomy Classification of Parallel Computers

Based on ArchitecturesTrend in Parallel Computer

Architectures

Page 2: Parallel Computer Architectures 2 nd  week

-2- Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM

REFERENCES

Scalable Parallel Computing– Kai Hwang and Zhiwei Xu, McGraw-Hill

Chapter.1Parallel Computing: Theory and Practice

– Michael J. Quinn, McGraw-HillChapter 3

Parallel Processing Course– Yang-Suk Kee([email protected])

School of EECS, Seoul National University

Page 3: Parallel Computer Architectures 2 nd  week

-3- Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM

FLYNN’S TAXONOMY

A way to classify parallel computers (Flynn,1972)

Based on notions of instruction and data streams– SISD (Single Instruction stream over a Single Data stream )– SIMD (Single Instruction stream over Multiple Data streams )– MISD (Multiple Instruction streams over a Single Data

stream)– MIMD (Multiple Instruction streams over Multiple Data

stream) Popularity

– MIMD > SIMD > MISD

Page 4: Parallel Computer Architectures 2 nd  week

-4- Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM

SISD (Single Instruction Stream Over A Single Data

Stream )

CU PU MU

IS

IS DSI/O

IS : Instruction Stream DS : Data StreamCU : Control Unit PU : Processing UnitMU : Memory Unit

SISD – Conventional sequential machines

Page 5: Parallel Computer Architectures 2 nd  week

-5- Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM

SIMD (Single Instruction Stream Over Multiple Data

Streams )SIMD

– Vector computers– Special purpose computations

CU

PE1 LM1

PEn LMn

DS

DS

DS

DS

ISIS

Program loaded from host

Data sets loaded from host

SIMD architecture with distributed memory

PE : Processing Element LM : Local Memory

Page 6: Parallel Computer Architectures 2 nd  week

-6- Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM

MISD – Processor arrays, systolic arrays– Special purpose computations

Is it different from pipeline computer?

MISD (Multiple Instruction Streams Over A Single Data

Streams)

Memory(Program,

Data) PU1 PU2 PUn

CU1 CU2 CUn

DS DS DSIS IS

IS IS

IS

DSI/O

MISD architecture (the systolic array)

Page 7: Parallel Computer Architectures 2 nd  week

-7- Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM

MIMD (Multiple Instruction Streams Over Multiple Data

Stream)MIMD

– General purpose parallel computers

CU1 PU1

SharedMemory

IS

IS DSI/O

CUn PUnIS DSI/O

IS

MIMD architecture with shared memory

Page 8: Parallel Computer Architectures 2 nd  week

-8- Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM

CLASSIFICATION BASED ON ARCHITECTURES

Pipelined ComputersDataflow ArchitecturesData Parallel SystemsMultiprocessorsMulticomputers

Page 9: Parallel Computer Architectures 2 nd  week

-9- Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM

Instructions are divided into a number of steps (segments, stages)

At the same time, several instructions can be loaded in the machine and be executed in different steps

PIPELINED COMPUTERS

Instruction i IF ID EX WB

IF ID EX WB

IF ID EX WB

IF ID EX WB

IF ID EX WB

Instruction i+1

Instruction i+2

Instruction i+3

Instruction i+4

Instruction # 1 2 3 4 5 6 7 8

Cycles

Page 10: Parallel Computer Architectures 2 nd  week

-10- Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM

DATAFLOW ARCHITECTURES

Data-driven model– A program is represented as a directed acyclic graph in

which a node represents an instruction and an edge represents the data dependency relationship between the connected nodes

– Firing rule: A node can be scheduled for execution if and only if its input data become valid for consumption

Dataflow languages– Id, SISAL, Silage, ...– Single assignment, applicative(functional) language, no

side effect– Explicit parallelism

Page 11: Parallel Computer Architectures 2 nd  week

-11- Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM

DATAFLOW GRAPH

z = (a + b) * c+

*

a

b

c

z

The dataflow representation of an arithmetic expression

Page 12: Parallel Computer Architectures 2 nd  week

-12- Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM

DATA-FLOW COMPUTER

Execution of instructions is driven by data availability

Basic components– Data are directly held inside instructions– Data availability check unit– Token matching unit– Chain reaction of asynchronous instruction

executionsWhat is the difference between this and

normal (control flow) computers?

Page 13: Parallel Computer Architectures 2 nd  week

-13- Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM

DATAFLOW COMPUTERS

Advantages– Very high potential for parallelism– High throughput – Free from side-effect

Disadvantages– Time lost waiting for unneeded arguments– High control overhead– Difficult in manipulating data structures

Page 14: Parallel Computer Architectures 2 nd  week

-14- Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM

Dataflow Machine (Manchester Dataflow

Computer)

From host

To host

First actual hardware implementation

I/OSwitch

MatchingUnit

OverflowUnit

TokenQueue

InstructionStore

func1 funck

Network

Token<data, tag, dest, marker>

Match<tag, dest>

Page 15: Parallel Computer Architectures 2 nd  week

-15- Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM

DATAFLOW REPRESENTATION

input d,e,f c0 = 0

for i from 1 to 4 do begin ai := di / ei

bi := ai * fi

ci := bi + ci-1

end

output a, b, c

/

d1 e1

*

+

f1

/

d2 e2

*

+

f2

/

d3 e3

*

+

f3

/

d4 e4

*

+

f4

a1

b1

a2

b2

a3

b3

a4

b4

c0 c4

c1 c2 c3

Page 16: Parallel Computer Architectures 2 nd  week

-16- Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM

EXECUTION ON CONTROL FLOW MACHINES

a1 b1 c1 a2 b2 c2 a4 b4 c4

Sequential execution on a uniprocessor in 24 cycles

Assume all the external inputs are available before entering do loop+ : 1 cycle, * : 2 cycles, / : 3 cycles,

How long will it take to execute this program on a dataflow computer with 4 processors?

Page 17: Parallel Computer Architectures 2 nd  week

-17- Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM

EXECUTION ON A DATAFLOW MACHINE

c1 c2 c3 c4a1

a2

a3

a4

b1

b2

b4

b3

Data-driven execution on a 4-processor dataflow computer in 9 cycles

Can we further reduce the execution time of this program ?

Page 18: Parallel Computer Architectures 2 nd  week

-18- Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM

PROBLEMS OF DATAFLOW COMPUTERS

Excessive copying of large data structures in dataflow operations– I-structure : a tagged memory unit for

overlapped usage by the producer and consumer

Retreat from pure dataflow approach (shared memory)

Handling complex data structures

Chain reaction control is difficult to implement– Complexity of matching store and memory

units Expose too much parallelism (?)

Page 19: Parallel Computer Architectures 2 nd  week

-19- Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM

DATA PARALLEL SYSTEMS

Programming model – Operations performed in parallel

on each element of data structure– Logically single thread of control,

performs sequential or parallel steps

– Conceptually, a processor associated with each data element

Page 20: Parallel Computer Architectures 2 nd  week

-20- Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM

DATA PARALLEL SYSTEMS (cont’d)

SIMD Architectural model– Array of many simple, cheap processors

with little memory eachProcessors don’t sequence through

instructions– Attached to a control processor that

issues instructions– Specialized and general communication,

cheap global synchronization

Page 21: Parallel Computer Architectures 2 nd  week

-21- Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM

VECTOR PROCESSOR

Instruction set includes operation on vectors as well as scalars

2 types of vector computers– Processor arrays– Pipelined vector processors

Page 22: Parallel Computer Architectures 2 nd  week

-22- Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM

PROCESSOR ARRAY

A sequential computer connected with a set of identical processing elements simultaneouls doing the same operation on different data. Eg CM-200

Data path

Processing element Data

memory

Inte

rcon

nect

ion

netw

ork

Processing element Data

memory

Processing element Data

memory

I/0

…Processor array

Instruction path

Program and Data Memory

CPU

I/O processor

Front-end computer

I/0

Page 23: Parallel Computer Architectures 2 nd  week

-23- Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM

PIPELINED VECTOR PROCESSOR

Stream vector from memory to the CPU

Use pipelined arithmetic units to manipulate data

Eg: Cray-1, Cyber-205

Page 24: Parallel Computer Architectures 2 nd  week

-24- Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM

MULTIPROCESSOR

Consists of many fully programmable processors each capable of executing its own program

Shared Address Space ArchitectureClassified into 2 types

– Uniform Memory Access (UMA) Multiprocessors

– Non-Uniform Memory Access (NUMA) Multiprocessors

Page 25: Parallel Computer Architectures 2 nd  week

-25- Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM

SHARED ADDRESS SPACE ARCHITECTURE

Virtual address spaces for a collection of processes communicating via shared addresses

Machine physical address space

Shared portion of address space

Private portion of address space

Pn private

Common physical addresses

P2 private

P1 private

P0 private

Store

Load

Page 26: Parallel Computer Architectures 2 nd  week

-26- Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM

Uses a central switching mechanism to reach a centralized shared memory

All processors have equal access time to global memory

Tightly coupled system Problem: cache consistency

UMA MULTIPROCESSOR

P1

Switching mechanism

I/O

P2 … Pn

Memory banks

C1 C2 CnPi

Ci

Processor i

Cache i

Page 27: Parallel Computer Architectures 2 nd  week

-27- Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM

UMA MULTIPROCESSOR (cont’d)

Crossbar switching mechanism

Mem

Mem

Mem

Mem

Cache

P

I/OCache

P

I/O

Page 28: Parallel Computer Architectures 2 nd  week

-28- Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM

UMA MULTIPROCESSOR (cont’d)

Shared- Bus switching mechanism

Mem Mem Mem Mem

Cache

P

I/OCache

P

I/O

Page 29: Parallel Computer Architectures 2 nd  week

-29- Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM

UMA MULTIPROCESSOR (cont’d)

Packet-switched network

Network

Mem Mem

Cache

P

Cache

P

Cache

P

Mem

Page 30: Parallel Computer Architectures 2 nd  week

-30- Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM

NUMA MULTIPROCESSOR

Distributed shared memory combined by local memory of all processors

Memory access time depends on whether it is local to the processor

Caching shared (particularly nonlocal) data?

Network

Cache

P

Mem Cache

P

Mem

Cache

P

Mem Cache

P

Mem

Distributed Memory

Page 31: Parallel Computer Architectures 2 nd  week

-31- Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM

CURRENT TYPES OF MULTIPROCESSORS

PVP (Parallel Vector Processor)– A small number of proprietary vector processors

connected by a high-bandwidth crossbar switch SMP (Symmetric Multiprocessor)

– A small number of COTS microprocessors connected by a high-speed bus or crossbar switch

DSM (Distributed Shared Memory)– Similar to SMP– The memory is physically distributed among

nodes.

Page 32: Parallel Computer Architectures 2 nd  week

-32- Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM

PVP (Parallel Vector Processor)

VP VP

Crossbar Switch

VP

SM SM SM

VP : Vector Processor SM : Shared Memory

Page 33: Parallel Computer Architectures 2 nd  week

-33- Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM

SMP (Symmetric Multi-Processor)

P/C P/C

Bus or Crossbar Switch

P/C

SM SM SM

P/C : Microprocessor and Cache

Page 34: Parallel Computer Architectures 2 nd  week

-34- Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM

DSM (Distributed Shared Memory)

Custom-Designed Network

MB MB

P/C

LM

NIC

DIR

P/C

LM

NIC

DIR

DIR : Cache Directory

Page 35: Parallel Computer Architectures 2 nd  week

-35- Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM

MULTICOMPUTERS

Consists of many processors with their own memory No shared memory Processors interact via message passing loosely coupled

system

Message-Passing Interconnection

Network

MP

MP

MP

PM

PM

PM

PM

PM

P

P

M

M

Page 36: Parallel Computer Architectures 2 nd  week

-36- Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM

CURRENT TYPES OF MULTICOMPUTERS

MPP (Massively Parallel Processing)– Total number of processors > 1000

Cluster– Each node in system has less than 16

processors.Constellation

– Each node in system has more than 16 processors

Page 37: Parallel Computer Architectures 2 nd  week

-37- Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM

MPP (Massively Parallel Processing)

P/C

LM

NIC

MB

P/C

LM

NIC

MB

Custom-Designed Network

MB : Memory Bus NIC : Network Interface Circuitry

Page 38: Parallel Computer Architectures 2 nd  week

-38- Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM

Cluster

Commodity Network (Ethernet, ATM, Myrinet, VIA)

MB MB

P/C

M

NIC

P/C

M

Bridge Bridge

LD LD

NIC

IOB IOB

LD : Local Disk IOB : I/O Bus

Page 39: Parallel Computer Architectures 2 nd  week

-39- Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM

CONSTELLATION

P/CP/C

SM SMNICLD

Hub

Custom or Commodity Network

>= 16

IOC

P/CP/C

SM SMNICLD

Hub

>= 16

IOC

IOC : I/O Controller

Page 40: Parallel Computer Architectures 2 nd  week

-40- Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM

Trend in Parallel Computer Architectures

0

50

100

150

200

250

300

350

400

1997 1998 1999 2000 2001 2002 Years

Num

ber

of H

PC

s

MPPs Constellations Clusters SMPs

Page 41: Parallel Computer Architectures 2 nd  week

-41- Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM

TOWARD ARCHITECTURAL CONVERGENCE

Evolution and role of software have blurred boundary– Send/recv supported on SAS machine via buffers– Can construct global address space on MP

Hardware organization converging too.– Tighter NI integration for MP (low latency, higher bandwidth)– At lower level, hardware SAS passes hardware messages.– Nodes connected by general network and communication

assists. Cluster of workstations/SMPs

– Emergence of fast SAN (System Area Networks)

Page 42: Parallel Computer Architectures 2 nd  week

-42- Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM

Converged Architecture of Current

Supercomputers

Interconnection NetworkInterconnection Network

MemoryMemory

PP PP PP PP

MemoryMemory

PP PP PP PP

MemoryMemory

PP PP PP PP

MemoryMemory

PP PP PP PP

MultiprocessorsMultiprocessorsMultiprocessorsMultiprocessors