Computer Architecture Dataflow Machines. Data Flow Conventional programming models are control...

27
Computer Architecture Dataflow Machines

Transcript of Computer Architecture Dataflow Machines. Data Flow Conventional programming models are control...

Page 1: Computer Architecture Dataflow Machines. Data Flow Conventional programming models are control driven Instruction sequence is precisely specified Sequence.

Computer Architecture

Dataflow Machines

Page 2: Computer Architecture Dataflow Machines. Data Flow Conventional programming models are control driven Instruction sequence is precisely specified Sequence.

Data Flow

• Conventional programming models are control driven• Instruction sequence is precisely specified• Sequence specifies control

• which instruction the CPU will execute next

• Execution rule:• Execute an instruction when its predecessor

has completed s1: r = a*b;s2: s = c*d;s3: y = r + s;

s2 executes when s1 is completes3 executes when s2 is complete

Page 3: Computer Architecture Dataflow Machines. Data Flow Conventional programming models are control driven Instruction sequence is precisely specified Sequence.

Data Flow• Consider the calculation

• y = a*b + c*d

• Represent it bya graph• Nodes represent

computations• Data flows along

arcs

• Execution rule:• Execute an instruction

when its data is available• Data driven rule

a b

x

+

d c

x

y

Page 4: Computer Architecture Dataflow Machines. Data Flow Conventional programming models are control driven Instruction sequence is precisely specified Sequence.

Data Flow• Dataflow firing rule

• An instruction fires (executes)when its data is available

• Exposes all possible parallelism• Either multiplication can

fire as soon as data arrives• Addition must wait

• Data dependence analysis!• Instruction issue units:

• Fire (issue) each instructionwhen its operands (registers) have been written

a b

x

+

d c

x

y

Page 5: Computer Architecture Dataflow Machines. Data Flow Conventional programming models are control driven Instruction sequence is precisely specified Sequence.

Data Flow - Realisations• Several Experimental Machines built

• Manchester Gurd & Watson

• Tagged Token Arvind, MIT

• SigmaETL, Tsukuba

• EMC-4 ETL, Tsukuba

• Monsoon Arvind, MIT

• EMX ETL, Tsukuba

• RAPID Osaka/Sharp/Mitsubishi(Asynchronous!)

• Naiad Tasmania

and some others

Page 6: Computer Architecture Dataflow Machines. Data Flow Conventional programming models are control driven Instruction sequence is precisely specified Sequence.

Data Flow - Realisations

• Manchester

Page 7: Computer Architecture Dataflow Machines. Data Flow Conventional programming models are control driven Instruction sequence is precisely specified Sequence.

Data Flow - Program• Program word

• Matching Store Entry

• When both Presence Flags are Y,this packet is despatched to a PE (any PE!)

Operation+, -, *, /

etc

Left, RightOperands Presence

Flags

DestinationAddress

DestinationLeft or Right

Page 8: Computer Architecture Dataflow Machines. Data Flow Conventional programming models are control driven Instruction sequence is precisely specified Sequence.

Data Flow - Matching Store

• Special purpose memory• Limited processing capability• Detects full slots• Despatches operation packets to any idle PE

Operation+, -, *, /

etc

Left, RightOperands Presence

Flags

DestinationAddress

DestinationLeft or Right

Page 9: Computer Architecture Dataflow Machines. Data Flow Conventional programming models are control driven Instruction sequence is precisely specified Sequence.

Data Flow - Processing Elements• Receive operation packets

• Generate result• Form result packet• Despatch to matching store

Page 10: Computer Architecture Dataflow Machines. Data Flow Conventional programming models are control driven Instruction sequence is precisely specified Sequence.

Data Flow - EM4• Architects

• Yamaguchi,Sakai, Kodama,Sato et al

• ElectroTechnicalLaboratory,Tsukuba,Japan

• PE (EM-Y)• CMOS Gate Array• 80k gates / 1.0• f = 20MHz• ~1992

Page 11: Computer Architecture Dataflow Machines. Data Flow Conventional programming models are control driven Instruction sequence is precisely specified Sequence.

Data Flow - Monsoon• Architects

• Papadopoulos, Culleret al

• MIT, Cambridge

• PE • f = 10MHz• ~1990

• I-StructureProcessor

Page 12: Computer Architecture Dataflow Machines. Data Flow Conventional programming models are control driven Instruction sequence is precisely specified Sequence.

Data Flow - I-Structures• Memory with a presence bit

• Tag each memory location with a bitindicating its validity

• Valid bit set -> normal read (no wait)

• Data not yet written (valid bit not set)WaitRead requests queued

Data driven execution

• Operations proceed when data is available

valid validdata data valid data

Page 13: Computer Architecture Dataflow Machines. Data Flow Conventional programming models are control driven Instruction sequence is precisely specified Sequence.

Data Flow - Monsoon Pipeline

• 8 stage pipeline• “Presence bits”

checks operandavailability

• Frame (coarse grain)basis

Page 14: Computer Architecture Dataflow Machines. Data Flow Conventional programming models are control driven Instruction sequence is precisely specified Sequence.

Data Flow - Summary• Fine-Grain Dataflow

• Suffered from comms network overload!

• Coarse-Grain Dataflow• Monsoon ...

• Overtaken by commercial technology!!

• A sad “fact-of-life”• It’s almost impossible to generate the funds

for non-”mainstream” computer architecture research

• $n x 108 required • Non-mainstream = interesting!

Page 15: Computer Architecture Dataflow Machines. Data Flow Conventional programming models are control driven Instruction sequence is precisely specified Sequence.

Data Flow - Summary• As a software model …

• Functional languages • Dataflow in a different guise! • Theoretically

• important

• Practically?• Inefficient ( = slow!!) • ….. Ask your CS colleagues!

• Cilk - based on C• Used on CIIPS Myrmidons• Uses a dataflow model

• Threads become ready for execution when their data is generated

• Message passing efficiency• Without explicit data transfer & synchronisation!

Page 16: Computer Architecture Dataflow Machines. Data Flow Conventional programming models are control driven Instruction sequence is precisely specified Sequence.

Networks

• Network Topology (or shape)• Vital to efficient parallel algorithms• Communication is the limiting factor!

• Ideal• Cross-bar

• Any-to-any• Non-blocking

• Except two sources to same receiver

• Realisable• But only for limited order (number of ports)

Page 17: Computer Architecture Dataflow Machines. Data Flow Conventional programming models are control driven Instruction sequence is precisely specified Sequence.

Networks

• Cross-bars• Achilles

• 8 x 8• Full duplex

• Simultaneous Input and Outputat each port

• 32 bit data-path• Target :

1Gbyte / second total throughput but we needed the 3-D arrangement to achieve

• bandwidth• high order

Page 18: Computer Architecture Dataflow Machines. Data Flow Conventional programming models are control driven Instruction sequence is precisely specified Sequence.

Networks

• Cross-bars• Achilles

• Hardwarealmost trivial!

• Single FPGAon each level

• Programmable• VHDL Models

• Several topologies

• Just by changing thesoftware!

Page 19: Computer Architecture Dataflow Machines. Data Flow Conventional programming models are control driven Instruction sequence is precisely specified Sequence.

Networks - More than 8 PEs

• Simple• Use 2 8x8 routers!

but ….This linkgets a lot of traffic!

Page 20: Computer Architecture Dataflow Machines. Data Flow Conventional programming models are control driven Instruction sequence is precisely specified Sequence.

Networks - Fat tree

• Problem:• High-traffic links between PEs can become a bottleneck

• Solution: Fat-tree• Links higher up the tree are “fatter”• Sustainable bandwidth between all PEs is the same

Page 21: Computer Architecture Dataflow Machines. Data Flow Conventional programming models are control driven Instruction sequence is precisely specified Sequence.

Networks - Performance Metrics

• Metrics for comparing network topologies• Diameter

• Maximum distance between any pair of nodes• Determines latency

• Bisection Bandwidth• Aggregate bandwidth over any “cut”

which divides the network in half• Determines throughput

• Crossbar• Diameter: 1

• Every PE is directly connected to routerso a single “hop” suffices

• Bisection Bandwidth: b bytes/sec• b is the bandwidth of a single link

Page 22: Computer Architecture Dataflow Machines. Data Flow Conventional programming models are control driven Instruction sequence is precisely specified Sequence.

Networks - Performance Metrics

• Metrics for comparing network topologies• To connect n PEs with mxm crossbars• Single link bandwidth b bytes/s

• Simple: n = 14 (2 switches)• Diameter 3

• Bisection Bandwidth b

1

2

3

Page 23: Computer Architecture Dataflow Machines. Data Flow Conventional programming models are control driven Instruction sequence is precisely specified Sequence.

Networks - Performance Metrics

• Fat-tree• Diameter: 2 logmn

• Height is logmn

• Worst case distance - up and down

• Bisection Bandwidth: b n/2 bytes/sec• Links are fatter higher up the tree

logmn

Page 24: Computer Architecture Dataflow Machines. Data Flow Conventional programming models are control driven Instruction sequence is precisely specified Sequence.

Networks - Performance Metrics

• Mesh• Diameter: 2n-2• Bisection Bandwidth: b n bytes/sec• Order: 4

Page 25: Computer Architecture Dataflow Machines. Data Flow Conventional programming models are control driven Instruction sequence is precisely specified Sequence.

Networks - Performance Metrics

• Hypercube• Hypercube of order m• Link 2 order m-1 hypercubes with 2m-1 links• Number of PEs: n = 2m

• Order: log2n = m

Order 2 Hypercube Order 2

Hypercube

Order 3 Hypercube

Page 26: Computer Architecture Dataflow Machines. Data Flow Conventional programming models are control driven Instruction sequence is precisely specified Sequence.

Networks - Hypercubes

• Embedding property• In an n PE hypercube,

we have hypercubes of size n/2, n/4, …• Number PEs with binary numbers

• 000, 001, 010, 011, 100, …• Joining two hypercubes

• add one binary digitto the numbering

• Each PE is connectedto every PE whoseindex differs in only one bit

Page 27: Computer Architecture Dataflow Machines. Data Flow Conventional programming models are control driven Instruction sequence is precisely specified Sequence.

Networks - Hypercubes

• Embedding property• Partitioning tasks

• Allocate to sub-cubes• Sub-tasks allocated to

sub-cubes of that cube,etc