Reconfigurable Computing and the von Neumann Syndrome Reiner Hartenstein.

172
Reconfigurable Computing and the von Neumann Syndrome Reiner Hartenstein

Transcript of Reconfigurable Computing and the von Neumann Syndrome Reiner Hartenstein.

Page 1: Reconfigurable Computing and the von Neumann Syndrome Reiner Hartenstein.

Reconfigurable Computing and the von Neumann

Syndrome

Reiner Hartenstein

Page 2: Reconfigurable Computing and the von Neumann Syndrome Reiner Hartenstein.

© 2007, [email protected] http://hartenstein.de2

TU KaiserslauternQuestions ?

• familiar with FPGAs ? Programming easy?

• Who is familiar with systolic arrays ?

• Duality: data streams vs. instruction streams ?

• Programming a multicore microprocessor: will it be easy ?

Page 3: Reconfigurable Computing and the von Neumann Syndrome Reiner Hartenstein.

© 2007, [email protected] http://hartenstein.de3

TU Kaiserslautern

pervas

Page 4: Reconfigurable Computing and the von Neumann Syndrome Reiner Hartenstein.

© 2007, [email protected] http://hartenstein.de4

TU KaiserslauternOutline

• The Pervasiveness of FPGAs• The Reconfigurable Computing Paradox• The Gordon Moore gap• The von Neumann syndrome• We need a dual paradigm approach• Conclusions

Page 5: Reconfigurable Computing and the von Neumann Syndrome Reiner Hartenstein.

© 2007, [email protected] http://hartenstein.de5

TU Kaiserslautern

FPGAs found everywhere

Page 6: Reconfigurable Computing and the von Neumann Syndrome Reiner Hartenstein.

© 2007, [email protected] http://hartenstein.de6

TU Kaiserslautern

Pervasiveness of RC

http://www.fpl.uni-kl.de/ RCeducation08/pervasiveness.html

http://hartenstein.de/pervasiveness.html

Page 7: Reconfigurable Computing and the von Neumann Syndrome Reiner Hartenstein.

© 2007, [email protected] http://hartenstein.de7

TU Kaiserslautern

RCeducation 2008

http://www.fpl.uni-kl.de/RCeducation08/

The 3rd International Workshop on Reconfigurable Computing Education

April 10, 2008, Montpellier, France

Page 8: Reconfigurable Computing and the von Neumann Syndrome Reiner Hartenstein.

© 2007, [email protected] http://hartenstein.de8

TU KaiserslauternOutline

• The Pervasiveness of FPGAs• The Reconfigurable Computing Paradox• The Gordon Moore gap• The von Neumann syndrome• We need a dual paradigm approach• Conclusions the hardware / software chasm,

the configware / software chasmthe instruction stream tunnelthe overhead-prone paradigm

Page 9: Reconfigurable Computing and the von Neumann Syndrome Reiner Hartenstein.

© 2007, [email protected] http://hartenstein.de9

TU KaiserslauternOutline

• The Pervasiveness of FPGAs• The Reconfigurable Computing Paradox• The Gordon Moore gap• The von Neumann syndrome• We need a dual paradigm approach• Conclusions

instruction-stream vs. data streambridging the chasm: an old hat

stubborn curriculum task forces

Page 10: Reconfigurable Computing and the von Neumann Syndrome Reiner Hartenstein.

© 2007, [email protected] http://hartenstein.de10

TU KaiserslauternOutline

• The Pervasiveness of FPGAs• The Reconfigurable Computing Paradox• The Gordon Moore gap• The von Neumann syndrome• We need a dual paradigm approach• Conclusions

Page 11: Reconfigurable Computing and the von Neumann Syndrome Reiner Hartenstein.

© 2007, [email protected] http://hartenstein.de11

TU Kaiserslautern

paradox

Outline

Page 12: Reconfigurable Computing and the von Neumann Syndrome Reiner Hartenstein.

© 2007, [email protected] http://hartenstein.de12

TU Kaiserslautern

RC education

http://www.fpl.uni-kl.de/RCeducation/

http://www.fpl.uni-kl.de/ RCeducation08/pervasiveness.html

Page 13: Reconfigurable Computing and the von Neumann Syndrome Reiner Hartenstein.

© 2007, [email protected] http://hartenstein.de13

TU KaiserslauternOutline

• The Pervasiveness of FPGAs• The Reconfigurable Computing Paradox• The Gordon Moore gap• The von Neumann syndrome• We need a dual paradigm approach• Conclusions

platform FPGAs,coarse-grained arrayssaving energy

Page 14: Reconfigurable Computing and the von Neumann Syndrome Reiner Hartenstein.

© 2007, [email protected] http://hartenstein.de14

TU Kaiserslautern

FPGA with island architectureFPGA with island architecture

FPGA with island architecture

reconfigurable logic box

switch box

connect box

reco

nfig

urab

le in

terc

onne

ct fa

brics

Page 15: Reconfigurable Computing and the von Neumann Syndrome Reiner Hartenstein.

© 2007, [email protected] http://hartenstein.de15

TU Kaiserslautern

reconfigurability overhead>

routing congestion

wiring overhead

overhead:

>> 10 000

1980 1990 2000 2010100

103

106

109

FPGAlogical

FPGArouted

density:

FPGAphysical

(Gordon Moore curve)

transistors / microchip

(microprocessor)

immense area inefficiency

immense area inefficiency

1st DeHon‘s Law[1996: Ph. D thesis, MIT]

general purpose “simple” FPGA

Deficiencies of reconfigurable fabrics (FPGA)

(fine-grained)

power guzzlerpower guzzlerslow clockslow clock

deficiency factor: >10,000

deficiency factor: >10,000

Page 16: Reconfigurable Computing and the von Neumann Syndrome Reiner Hartenstein.

© 2007, [email protected] http://hartenstein.de16

TU Kaiserslautern

Software-to-Configware (FPGA) Migration:

molecular dynamics simulationmolecular dynamics simulation

88

some published speed-up factors [2003 – 2005]

1980 1990 2000 2010100

103

106

real-time face detectionreal-time face detection60006000

video-rate stereo vision

video-rate stereo vision

900pattern

recognitionpattern

recognition730

SPIHT wavelet-based image compressionSPIHT wavelet-based image compression457

FFTFFT100

Reed-Solomon DecodingReed-Solomon Decoding2400

Viterbi DecodingViterbi Decoding

400

1000

MACMAC

DSP and wireless

Image processing,Pattern matching,

Multimedia

BLASTBLAST52

protein identificationprotein identification 40

Smith-Waterman pattern matching

Smith-Waterman pattern matching

288

Bioinformatics

GRAPEGRAPE2020AstrophysicsAstrophysics

speedu

p f

act

or

cryptocrypto1000

oil and gas oil and gas1717

X 2/yr

Reiner Hartenstein
Success with RC has been achieved in a variety of areas such as signal and image processing, cryptology, communications processing, data and text mining, and global optimization, for a variety of platform types, from high-end systems on earth to mission-critical systems in space.
Page 17: Reconfigurable Computing and the von Neumann Syndrome Reiner Hartenstein.

© 2007, [email protected] http://hartenstein.de17

TU Kaiserslautern

Page 18: Reconfigurable Computing and the von Neumann Syndrome Reiner Hartenstein.

© 2007, [email protected] http://hartenstein.de18

TU Kaiserslautern

Software-to-Configware (FPGA) Migration:

molecular dynamics simulationmolecular dynamics simulation

88

some published speed-up factors [2003 – 2005]

1980 1990 2000 2010100

103

106

real-time face detectionreal-time face detection60006000

video-rate stereo vision

video-rate stereo vision

900pattern

recognitionpattern

recognition730

SPIHT wavelet-based image compressionSPIHT wavelet-based image compression457

FFTFFT100

Reed-Solomon DecodingReed-Solomon Decoding2400

Viterbi DecodingViterbi Decoding

400

1000

MACMAC

DSP and wireless

Image processing,Pattern matching,

Multimedia

BLASTBLAST52

protein identificationprotein identification 40

Smith-Waterman pattern matching

Smith-Waterman pattern matching

288

Bioinformatics

GRAPEGRAPE2020AstrophysicsAstrophysics

speedu

p f

act

or

cryptocrypto1000

oil and gas oil and gas1717

X 2/yr

PISA

The RC paradoxThe RC paradox

deficiency factor: >10,000speed-up factor: 6,000total discrepancy:

>60,000,000

3000

Reiner Hartenstein
Success with RC has been achieved in a variety of areas such as signal and image processing, cryptology, communications processing, data and text mining, and global optimization, for a variety of platform types, from high-end systems on earth to mission-critical systems in space.
Page 19: Reconfigurable Computing and the von Neumann Syndrome Reiner Hartenstein.

© 2007, [email protected] http://hartenstein.de19

TU Kaiserslautern

Software-to-Configware (FPGA) Migration:

molecular dynamics simulationmolecular dynamics simulation

88

some published speed-up factors [2003 – 2005]

1980 1990 2000 2010100

103

106

real-time face detectionreal-time face detection60006000

video-rate stereo vision

video-rate stereo vision

900pattern

recognitionpattern

recognition730

SPIHT wavelet-based image compressionSPIHT wavelet-based image compression457

FFTFFT100

Reed-Solomon DecodingReed-Solomon Decoding2400

Viterbi DecodingViterbi Decoding

400

1000

MACMAC

DSP and wireless

Image processing,Pattern matching,

Multimedia

BLASTBLAST52

protein identificationprotein identification 40

Smith-Waterman pattern matching

Smith-Waterman pattern matching

288

Bioinformatics

GRAPEGRAPE2020AstrophysicsAstrophysics

speedu

p f

act

or

cryptocrypto1000

oil and gas oil and gas1717

X 2/yr

The RC paradoxThe RC paradox

deficiency factor: >10,000speed-up factor: 6,000total discrepancy:

>60,000,000

3000

Reiner Hartenstein
Success with RC has been achieved in a variety of areas such as signal and image processing, cryptology, communications processing, data and text mining, and global optimization, for a variety of platform types, from high-end systems on earth to mission-critical systems in space.
Page 20: Reconfigurable Computing and the von Neumann Syndrome Reiner Hartenstein.

© 2007, [email protected] http://hartenstein.de20

TU Kaiserslautern

Software-to-Configware (FPGA) Migration:

molecular dynamics simulationmolecular dynamics simulation

88

some published speed-up factors [2003 – 2005]

1980 1990 2000 2010

106

real-time face detectionreal-time face detection60006000

video-rate stereo vision

video-rate stereo vision

900pattern

recognitionpattern

recognition730

SPIHT wavelet-based image compressionSPIHT wavelet-based image compression457

FFTFFT100

Reed-Solomon DecodingReed-Solomon Decoding2400

Viterbi DecodingViterbi Decoding

400

1000

MACMAC

DSP and wireless

Image processing,Pattern matching,

Multimedia

BLASTBLAST52

protein identificationprotein identification 40

Smith-Waterman pattern matching

Smith-Waterman pattern matching

288

Bioinformatics

GRAPEGRAPE2020AstrophysicsAstrophysics

speedu

p f

act

or

cryptocrypto1000

oil and gas oil and gas1717

X 2/yr

PISA

The RC paradoxThe RC paradox

100

103

deficiency factor: >10,000speed-up factor: 6,000total discrepancy:

>60,000,000

Reiner Hartenstein
Success with RC has been achieved in a variety of areas such as signal and image processing, cryptology, communications processing, data and text mining, and global optimization, for a variety of platform types, from high-end systems on earth to mission-critical systems in space.
Page 21: Reconfigurable Computing and the von Neumann Syndrome Reiner Hartenstein.

© 2007, [email protected] http://hartenstein.de21

TU Kaiserslautern

Software-to-Configware (FPGA) Migration:some published speed-up factors [2003 – 2005]

These examples worked fine with on-chip memoryThere are other

algorithms more difficult to

accelerate …

… where d-

daching might be

useful (ASM)

Reiner Hartenstein
Success with RC has been achieved in a variety of areas such as signal and image processing, cryptology, communications processing, data and text mining, and global optimization, for a variety of platform types, from high-end systems on earth to mission-critical systems in space.
Page 22: Reconfigurable Computing and the von Neumann Syndrome Reiner Hartenstein.

© 2007, [email protected] http://hartenstein.de22

TU Kaiserslautern

platform-FPGA

Outline

Page 23: Reconfigurable Computing and the von Neumann Syndrome Reiner Hartenstein.

© 2007, [email protected] http://hartenstein.de23

TU Kaiserslautern

How much on-chip embedded BRAM ?

256 – 1704 BGA

56 – 424

8 – 32

fast on-chip block RAMs:

BRAMs

DPU:coarse-grained

On-chip LatticeCS

series

Page 24: Reconfigurable Computing and the von Neumann Syndrome Reiner Hartenstein.

© 2007, [email protected] http://hartenstein.de24

TU Kaiserslautern

coarse

Page 25: Reconfigurable Computing and the von Neumann Syndrome Reiner Hartenstein.

© 2007, [email protected] http://hartenstein.de25

TU Kaiserslautern

rDPU not used used for routing only operator and routing port location markerLegend: backbus connect

array size: 10 x 16 rDPUs

Coarse-grained Reconfigurable Array

rout thru only

not usedbackbus connect

SNN filter on (supersystolic) KressArray (mainly a pipe network)

reconfigurable Data Path Unit, 32 bits wide

reconfigurable Data Path Unit, 32 bits wide

no CPU

rDPUrDPU

note: software perspective without instruction streams:

pipelining

note: software perspective without instruction streams:

pipelining

compiled by Nageldinger‘s KressArray Xplorer with Juergen Becker‘s CoDe-X inside

question after the talk: „but you can‘t implement decisions!“

Page 26: Reconfigurable Computing and the von Neumann Syndrome Reiner Hartenstein.

© 2007, [email protected] http://hartenstein.de26

TU Kaiserslautern

Simple KressArray Configuration Example

Page 27: Reconfigurable Computing and the von Neumann Syndrome Reiner Hartenstein.

© 2007, [email protected] http://hartenstein.de27

TU Kaiserslautern

DPU

Much less deficiencies by coarse-grained

1980 1990 2000 2010100

103

106

109

(Gordon Moore curve)

transistors / microchip

rDPA physical rDPA logical

area efficiency very close to Moore‘s

law

area efficiency very close to Moore‘s

law

Hartenstein‘s Law

[1996: ISIS, Austin, TX]

very compact configuration code: very

fast reconfiguration

very compact configuration code: very

fast reconfiguration

rDPU

DPU

CPUCPU programcounter

rDPUrDPUrDPUrDPUrDPUrDPUrDPUrDPUrDPUrDPUrDPUrDPUrDPUrDPUrDPUrDPUrDPUrDPUrDPUrDPUrDPUrDPUrDPUrDPUrDPUrDPUrDPUrDPUrDPUrDPUrDPUrDPU

Page 28: Reconfigurable Computing and the von Neumann Syndrome Reiner Hartenstein.

© 2007, [email protected] http://hartenstein.de28

TU Kaiserslautern

energy

Page 29: Reconfigurable Computing and the von Neumann Syndrome Reiner Hartenstein.

© 2007, [email protected] http://hartenstein.de29

TU Kaiserslautern

Software-to-Configware (FPGA) Migration:Oil and gas [2005]

1980 1990 2000 2010100

103

106

speedu

p f

act

or

oil and gas oil and gas1717

X 2/yr

side effect: slashing the electricity billby more than an order of magnitude

side effect: slashing the electricity billby more than an order of magnitude

Reiner Hartenstein
Success with RC has been achieved in a variety of areas such as signal and image processing, cryptology, communications processing, data and text mining, and global optimization, for a variety of platform types, from high-end systems on earth to mission-critical systems in space.
Page 30: Reconfigurable Computing and the von Neumann Syndrome Reiner Hartenstein.

© 2007, [email protected] http://hartenstein.de30

TU KaiserslauternAn accidentially discovered side effect

•Software to FPGA migration of an oil and gas application:•Speed-up factor of 17•Electricity bill down to <10%•Hardware cost down to <10%

•All other publications reporting speed-up did not report energy consumption.

Saves > $10,000 in electricity bills per year (7¢ / kWh) - .... per 64-processor 19" rack

What about higher

speed-up factors ?What about higher

speed-up factors ?

More dramatic

electricity savings? More dramatic

electricity savings?

Herb Riley, R. Associates

$70 in 2010?$70 in 2010?

- This will change.

Page 31: Reconfigurable Computing and the von Neumann Syndrome Reiner Hartenstein.

© 2007, [email protected] http://hartenstein.de31

TU Kaiserslautern

What’s Really Going On With Oil Prices? [BusinessWeek, January 29, 2007]

$52 Price of delivery in February 2007 [New York Mercantile Exchange: Jan. 17]

$200 Minimum oil price in 2010, in a bet by investment banker Matthew Simmons

Page 32: Reconfigurable Computing and the von Neumann Syndrome Reiner Hartenstein.

© 2007, [email protected] http://hartenstein.de32

TU KaiserslauternEnergy as a strategic issue

•Google‘s annual electricity bill: 50,000,000 $

•Amsterdam‘s electricity: 25% into server farms

•NY city server farms: 1/4 km2 building floor area

[Mark P. Mills]•Predicted f. USA in 2020: 30-50% of the entire national electricity consumption goes into cyber infrastructure

•petaFlop supercomputer (by 2012 ?): extreme power consumption

Page 33: Reconfigurable Computing and the von Neumann Syndrome Reiner Hartenstein.

© 2007, [email protected] http://hartenstein.de33

TU KaiserslauternEnergy: an im portant motivation

platform example

Energy: W / Gflops

energy factor

MDgrape-3*(domain-specific 2004)

0.2 1

Pentium 4 14 70

Earth Simulator(supercomputer 2003)

128 640

*) feasible also on reconfigurable platforms*) feasible also on reconfigurable platforms

Reiner Hartenstein
GRAvity PipE: special purpose computer for astrophysical N-body simulations, and, Molecular Dynamics Simulations.MDGRAPE-3 (aka Protein Explorer): Petaflops-GRAPE [Univ. of Tokyo & Genomic Sciences Center at RIKEN institute]Petaflops by GRAPE (non-reconfigurable)massive pipelining and on-chip distributed memory - several Gordon Bell awards
Page 34: Reconfigurable Computing and the von Neumann Syndrome Reiner Hartenstein.

© 2007, [email protected] http://hartenstein.de34

TU Kaiserslautern

Moore gap

Page 35: Reconfigurable Computing and the von Neumann Syndrome Reiner Hartenstein.

© 2007, [email protected] http://hartenstein.de35

TU KaiserslauternOutline

• The Pervasiveness of FPGAs• The Reconfigurable Computing Paradox• The Gordon Moore gap• The von Neumann syndrome• We need a dual paradigm approach• Conclusions

& the multicore crisis

Page 36: Reconfigurable Computing and the von Neumann Syndrome Reiner Hartenstein.

© 2007, [email protected] http://hartenstein.de36

TU Kaiserslautern

Moore’s law not applicable to all

aspects of VLSI

What is the reason of the paradox ?

The Gordon Moore curve does not indicate performance

The peak clock frequency does not indicate performance

the law of Gates

Reiner Hartenstein
slow inter-processor communications and limited ability to parallelize algorithms. early attempts could employ only tens of processors.Current state-of-the-art systems can apply thousands of microprocessorswith high-bandwidth, low-latency interconnects to the most challenging HPC problems. Nevertheless, even these systems have limitations for certain types of HPC applications. Eventually, the extra overhead required for parallel processing overcomes the benefits thatthe additional processors provide, ---FPGAs are not well suited for running the operating system and connecting to networks and disk drives
Page 37: Reconfigurable Computing and the von Neumann Syndrome Reiner Hartenstein.

© 2007, [email protected] http://hartenstein.de37

TU Kaiserslautern

Rapid Decline of Computational Density

[BWRC, UC Berkeley, 2004]

1990 1995 2000 2005

200

100

0

50

150

75

25

125

175

SP

EC

fp20

00/M

Hz/

Bill

ion

Tra

nsis

tors

DEC alpha

SUNHP

IBM

alp

ha:

dow

n b

y 1

00

in

6 y

rsIB

M:

dow

n b

y 2

0 in 6

yrs

stolen from Bob Colwell

CPU

memory wall, caches, ...

primary design goal: avoiding a paradigm shiftdramatic demo of the von Neumann Syndrome

Page 38: Reconfigurable Computing and the von Neumann Syndrome Reiner Hartenstein.

© 2007, [email protected] http://hartenstein.de38

TU Kaiserslautern

Monstrous Steam Engines of Computing

5120 Processors, 5000 pins eachCrossbar weight: 220 t, 3000 km of thick cable,

larger than a battleship

power measured in tens of megawatts,

floor space measured in tens of thousands of square feet

ready 2003

Page 39: Reconfigurable Computing and the von Neumann Syndrome Reiner Hartenstein.

© 2007, [email protected] http://hartenstein.de39

TU Kaiserslautern

Dead Supercomputer Society

•ACRI •Alliant •American Supercomputer •Ametek •Applied Dynamics •Astronautics •BBN •CDC•Convex•Cray Computer •Cray Research •Culler-Harris •Culler Scientific •Cydrome •Dana/Ardent/ Stellar/Stardent

•DAPP •Denelcor •Elexsi •ETA Systems •Evans and Sutherland•Computer•Floating Point Systems •Galaxy YH-1 •Goodyear Aerospace MPP •Gould NPL •Guiltech •ICL •Intel Scientific Computers •International Parallel . Machines •Kendall Square Research •Key Computer Laboratories

Research 1985 – 1995 [Gordon Bell, keynote ISCA 2000]

•MasPar•Meiko •Multiflow •Myrias •Numerix •Prisma •Tera •Thinking Machines •Saxpy •Scientific Computer•Systems (SCS) •Soviet Supercomputers •Supertek •Supercomputer Systems •Suprenum •Vitesse Electronics

Page 40: Reconfigurable Computing and the von Neumann Syndrome Reiner Hartenstein.

© 2007, [email protected] http://hartenstein.de40

TU Kaiserslautern

We are in a Computing Crisis

platform example

hardw cost $ / Gflops

cost factor

energyW / Gflops

energy factor

MDgrape-3*(domain-specific 2004)

15 1 0.2 1

Pentium 4 400 27 14 70

Earth Simulator(supercomputer 2003)

8000 533 128 640

*) feasible also with rDPA*) feasible also with rDPA

microprocessor crisisgoing multi core

supercomputing crisisMPP parallelism does not scale

Reiner Hartenstein
GRAvity PipE: special purpose computer for astrophysical N-body simulations, and, Molecular Dynamics Simulations.MDGRAPE-3 (aka Protein Explorer): Petaflops-GRAPE [Univ. of Tokyo & Genomic Sciences Center at RIKEN institute]Petaflops by GRAPE (non-reconfigurable)massive pipelining and on-chip distributed memory - several Gordon Bell awards
Page 41: Reconfigurable Computing and the von Neumann Syndrome Reiner Hartenstein.

© 2007, [email protected] http://hartenstein.de41

TU Kaiserslautern

Syndrome

Page 42: Reconfigurable Computing and the von Neumann Syndrome Reiner Hartenstein.

© 2007, [email protected] http://hartenstein.de42

TU KaiserslauternThe von Neumann Paradigm

Trap

•Program counter (auto-increment, jump, goto, branch)•Datapath Unit with ALU etc.,•I/O unit, ….

•[Burks, Goldstein, von Neumann; 1946]•RAM (memory cells have adresses ….)

CS education got stuck in this paradigm trap which stems from technology of the 1940s

We need a dual paradigm approach

CS education’s right eye is blind, and its left eye suffers from tunnel view

Page 43: Reconfigurable Computing and the von Neumann Syndrome Reiner Hartenstein.

© 2007, [email protected] http://hartenstein.de43

TU Kaiserslautern

What is the reason of the paradox ?

Result from decades of tunnel view in CS R&D and educationbasic mind set completely wrong

the von Neumann Syndrome

“CPU: most flexible platform” ?

>1000 CPUs running in parallel: the most inflexible platform

However, FPGA & rDPA are very flexible

The Law of More:drastically declining programmer productivity

Reiner Hartenstein
slow inter-processor communications and limited ability to parallelize algorithms. early attempts could employ only tens of processors.Current state-of-the-art systems can apply thousands of microprocessorswith high-bandwidth, low-latency interconnects to the most challenging HPC problems. Nevertheless, even these systems have limitations for certain types of HPC applications. Eventually, the extra overhead required for parallel processing overcomes the benefits thatthe additional processors provide, ---FPGAs are not well suited for running the operating system and connecting to networks and disk drives
Page 44: Reconfigurable Computing and the von Neumann Syndrome Reiner Hartenstein.

© 2007, [email protected] http://hartenstein.de44

TU Kaiserslautern

multicore

Page 45: Reconfigurable Computing and the von Neumann Syndrome Reiner Hartenstein.

© 2007, [email protected] http://hartenstein.de45

TU Kaiserslautern

Executive Summary doesn‘t help

We must first understand the nature of the paradigm

Understanding the Paradox ?

von Neumann chickens ?

Page 46: Reconfigurable Computing and the von Neumann Syndrome Reiner Hartenstein.

© 2007, [email protected] http://hartenstein.de46

TU Kaiserslautern

Page 47: Reconfigurable Computing and the von Neumann Syndrome Reiner Hartenstein.

© 2007, [email protected] http://hartenstein.de47

TU Kaiserslautern

models

Page 48: Reconfigurable Computing and the von Neumann Syndrome Reiner Hartenstein.

© 2007, [email protected] http://hartenstein.de48

TU KaiserslauternVon Neumann CPU

DPUprogramcounter

DPUCPUCPU

termprogra

m counter

execution triggered

byparadigm

CPUyes instruction

fetchinstruction-stream-based

RAMmemory- World of Software -Engineering- World of Software -Engineering

Program Source: SoftwareProgram Source: Software

(tunnel view with the left eye)

Page 49: Reconfigurable Computing and the von Neumann Syndrome Reiner Hartenstein.

© 2007, [email protected] http://hartenstein.de49

TU Kaiserslautern

von Neumann is not the common model

programcounter

DPUCPU

RAMmemory

von Neumann bottleneck

von Neumann instruction-

stream-based machine

co-processors

acceleratorCPU

instruction-stream-based

data-stream-

based

hard

ware

software

mainframe age:

microprocessor age:

Page 50: Reconfigurable Computing and the von Neumann Syndrome Reiner Hartenstein.

© 2007, [email protected] http://hartenstein.de50

TU Kaiserslautern

Here is the contemporary common model

programcounter

DPUCPU

RAMmemory

von Neumann bottleneck

von Neumann instruction-

stream-based machine

co-processors

acceleratorCPU

instruction-stream-based

data-stream-

based

hard

ware

software

mainframe age:

microprocessor age:

Now we are in the configware age:

accelerator reconfigurable

accelerator hardwired

CPU

Page 51: Reconfigurable Computing and the von Neumann Syndrome Reiner Hartenstein.

© 2007, [email protected] http://hartenstein.de51

TU Kaiserslautern

term

program

counter

execution triggered

byparadigm

CPUyes instructio

n fetchinstruction-stream-based

DPU**no data

arrival*data-

stream-based

machine models

DPUCPUCPU

programcounter

RAMmemory

von

Neu

man

n

Anti machin

e

RAMdata

counter

RAMdata

counter

DPU

RAMdata

counter

rDPUrDPUrDPUrDPUrDPUrDPUrDPUrDPUrDPUrDPUrDPUrDPUrDPUrDPUrDPUrDPUrDPUrDPUrDPUrDPUrDPUrDPUrDPUrDPUrDPUrDPUrDPUrDPUrDPUrDPUrDPUrDPU

*) “transport-triggered”**) does not have a program counter

- no instruction fetch at run time

Page 52: Reconfigurable Computing and the von Neumann Syndrome Reiner Hartenstein.

© 2007, [email protected] http://hartenstein.de52

TU Kaiserslautern

Page 53: Reconfigurable Computing and the von Neumann Syndrome Reiner Hartenstein.

© 2007, [email protected] http://hartenstein.de53

TU Kaiserslautern

Nick Tredennick’s Paradigm Shifts:

Von NeumannVon Neumann

1 programming source needed

algorithm: variable

resources: fixedsoftware

CPU

Early historic machinesEarly historic machines

algorithm: fixed

resources: fixed

(slowly preparing to use both eyes for a dual paradigm point of

view)

Page 54: Reconfigurable Computing and the von Neumann Syndrome Reiner Hartenstein.

© 2007, [email protected] http://hartenstein.de54

TU KaiserslauternCompilation: Software

source program

softwarecompiler

software code

Software Engineeri

ng

Software Engineeri

ng

instruction schedule(Befehls-Fahrplan)sequential

(von Neumann model)

Page 55: Reconfigurable Computing and the von Neumann Syndrome Reiner Hartenstein.

© 2007, [email protected] http://hartenstein.de55

TU KaiserslauternNick Tredennick’s Paradigm

Shifts

configware resources: variable 2 programming sources needed

flowware algorithm: variable

Reconfigurable ComputingReconfigurable Computing

Von NeumannVon Neumann

1 programming source needed

algorithm: variable

resources: fixedsoftware

CPU

Early historic machinesEarly historic machines

algorithm: fixed

resources: fixed

Page 56: Reconfigurable Computing and the von Neumann Syndrome Reiner Hartenstein.

© 2007, [email protected] http://hartenstein.de56

TU Kaiserslautern

datacounter

GAG RAM

ASM

datacounter

GAG RAM

ASM

datacounter

GAG RAM

ASM

Configware Compilation

configware code

flowware code

mapper

configwarecompiler

scheduler

source „program“

Configware

Engineering

Configware

Engineering

placement &

routing

data

C, FORTRANMATHLAB

programming the data counters

configware compilation fundamentally different from software compilation

configware compilation fundamentally different from software compilation

xxx

xxx

xxx

|

||

x xxx

xx

x xx

- --

xxxx

xx

xxx

---

---

---

---

xxx

xxx

xxx

|

|

|

|

|

|

|

|

|

|

|

|

|

| data streams

rDPA

pipe network

datacounter

GAG RAM

ASM: Auto-Sequencing MemoriesASM

Page 57: Reconfigurable Computing and the von Neumann Syndrome Reiner Hartenstein.

© 2007, [email protected] http://hartenstein.de57

TU KaiserslauternThe first archetype machine model

mainframe

CPU

compile orassemble

proceduralpersonalization

Software IndustrySoftware Industry Software Industry’sSecret of Success

simple basic .Machine Paradigm

personalization:RAM-based

instruction-stream- based mind set

“von Neumann”

But now we live in the Configware AgeBut now we live in the Configware Age

Page 58: Reconfigurable Computing and the von Neumann Syndrome Reiner Hartenstein.

© 2007, [email protected] http://hartenstein.de58

TU Kaiserslautern

systolic

Page 59: Reconfigurable Computing and the von Neumann Syndrome Reiner Hartenstein.

© 2007, [email protected] http://hartenstein.de59

TU Kaiserslautern

of course algebraic (linear projection)

only for applications with regular data dependencies

Mathematicians caught by their own paradigm trap

Rainer Kress discarded their algebraic synthesis methods and replaced it by simulated annealing:rDPA

1995

Synthesis Method?

The super-systolic array: a generalization of the systolic array

The super-systolic array: a generalization of the systolic array

reductionist

approachreductionist

approach

Page 60: Reconfigurable Computing and the von Neumann Syndrome Reiner Hartenstein.

© 2007, [email protected] http://hartenstein.de60

TU Kaiserslautern

Having introduced Data streams

xxx

xxx

xxx

|

||

x x

x

x

x

x

x x

x

- -

-

input data stream

xx

x

x

x

x

xx

x

--

-

-

-

-

-

-

-

-

-

-

xxx

xxx

xxx

|

|

|

|

|

|

|

|

|

|

|

|

|

|output data streams

„data

streams“ time

port #

time

time

port #time

port #

systolic array research: throughout

the 80ies: Mathematicians‘

hobby

The road map to HPC: ignored for

decades~1980

DPA (pipe network)

execution transport-triggered

no memory wall

H. T. Kung

Page 61: Reconfigurable Computing and the von Neumann Syndrome Reiner Hartenstein.

© 2007, [email protected] http://hartenstein.de61

TU KaiserslauternWho generates the Data

Streams?

Mathematicians: it‘s not our job

xxx

xxx

xxx

|

||

x xx

x

xx

x x

x

- -

-

xx

x

x

x

x

xx

x

--

-

-

-

-

-

-

-

-

-

-

xxx

xxx

xxx

|

|

|

|

|

|

|

|

|

|

|

|

|

|

(it‘s not algebraic)

„systolic“

Page 62: Reconfigurable Computing and the von Neumann Syndrome Reiner Hartenstein.

© 2007, [email protected] http://hartenstein.de62

TU KaiserslauternWithout a sequencer …

… it’s not a machine

Mathematicians have missed to

invent the new machine paradigm

Mathematicians have missed to

invent the new machine paradigm

reductionist approach:reductionist approach:

(it‘s not our job) resources

sequencer

Machine:

... the anti machine

... the anti machine

Page 63: Reconfigurable Computing and the von Neumann Syndrome Reiner Hartenstein.

© 2007, [email protected] http://hartenstein.de63

TU Kaiserslautern

The counterpart of the von Neumann

machinexxx

xxx

xxx

|

||

x x

x

x

x

x

x x

x

- -

-

xx

x

x

x

x

xx

x

--

-

-

-

-

-

-

-

-

-

-

xxx

xxx

xxx

|

|

|

|

|

|

|

|

|

|

|

|

|

|

(r)DPA

ASM

ASM

ASM

ASM

ASM

ASM

AS

M

AS

M

AS

M

AS

M

AS

M

AS

M

datacounter

GAG RAM

ASM: Auto-Sequencing

Memory

data counters instead of a program counter

data counters: located at memory

(not at data path)

Kress /Kung Anti Machine

Kress /Kung Anti Machine

coarse-grained

Page 64: Reconfigurable Computing and the von Neumann Syndrome Reiner Hartenstein.

© 2007, [email protected] http://hartenstein.de64

TU Kaiserslautern

Acceleration Mechanisms by ASM-based MoMSW

•parallelism by multi bank memory architecture•reconfigurable address compuattion – before run time

•avoiding multiple accesses to the same data.•avoiding memory cycles for address computation•improve parallelism by storage scheme transformations

•minimize data movement across chip boundaries

Page 65: Reconfigurable Computing and the von Neumann Syndrome Reiner Hartenstein.

© 2007, [email protected] http://hartenstein.de65

TU Kaiserslautern

FPGAs in Supercomputing

• Synergisms: coarse-grained parallelism through conventional parallel processing,

reconfigurable logic box: 1

Bit

• and: fine-grained parallelism through direct configware execution on the FPGAs

DPUprogramcounter

DPUCPUCPU

DataPath Units32 Bit, 64 Bit

(millions of rLBs embedded in a reconfigurable interconnect fabrics)

Page 66: Reconfigurable Computing and the von Neumann Syndrome Reiner Hartenstein.

© 2007, [email protected] http://hartenstein.de66

TU Kaiserslautern

Anti machine

resources

sequencer

memory

algorithms

flowware

data counters

hardwired anti machine:resources

sequencer

memory

algorithms

flowware

data counters

reconfigurable anti machine:

configware

Page 67: Reconfigurable Computing and the von Neumann Syndrome Reiner Hartenstein.

© 2007, [email protected] http://hartenstein.de67

TU Kaiserslautern

von Neumann machine

resources

sequencer

Machine: resources

sequencer

memory

algorithms

softwareprogram counter

von Neumann machine:

Page 68: Reconfigurable Computing and the von Neumann Syndrome Reiner Hartenstein.

© 2007, [email protected] http://hartenstein.de68

TU Kaiserslautern

The clash of paradigms

a programmer does not understand function evaluation without machine mechanisms - without a pogram counter …

acceleratorsacceleratorsµprocessorµprocessor

structural

hardware guyprogrammer

procedural

the basic mind set isinstruction-stream-based

kind of data-stream-based mind set

the software / hardware chasmthe software / hardware chasm

we need a datastream based machine paradigm

microprocessor age:

Page 69: Reconfigurable Computing and the von Neumann Syndrome Reiner Hartenstein.

© 2007, [email protected] http://hartenstein.de69

TU Kaiserslautern

Xputer Principles

addr. generators reconfigurable

Data Path reconfigurable

Xputer

CPUWe used the VAX-11/750 of my group

DPLA

rALU

contemporary ?

1984: first FPGAs: very tiny & very expensive ASMASM

Page 70: Reconfigurable Computing and the von Neumann Syndrome Reiner Hartenstein.

© 2007, [email protected] http://hartenstein.de70

TU Kaiserslautern

super

Page 71: Reconfigurable Computing and the von Neumann Syndrome Reiner Hartenstein.

© 2007, [email protected] http://hartenstein.de71

TU Kaiserslautern

Page 72: Reconfigurable Computing and the von Neumann Syndrome Reiner Hartenstein.

© 2007, [email protected] http://hartenstein.de72

TU KaiserslauternOutline

• The von Neumann Paradigm• Accelerators and FPGAs• The Reconfigurable Computing Paradox• The new Paradigm• Coarse-grained• Bridging the Paradigm Chasm• Conclusions

Page 73: Reconfigurable Computing and the von Neumann Syndrome Reiner Hartenstein.

© 2007, [email protected] http://hartenstein.de73

TU Kaiserslautern

dynamic

Page 74: Reconfigurable Computing and the von Neumann Syndrome Reiner Hartenstein.

© 2007, [email protected] http://hartenstein.de74

TU KaiserslauternFPGA Modes of Operation

configware code loaded from external flash memory, e. g. after power-on (~milliseconds)

time

C ph

offE ph

Execution phase

E ph

Configuration phase

C ph

Legend:

simple, static reconfigurability

(requiring new OS

principles)

Page 75: Reconfigurable Computing and the von Neumann Syndrome Reiner Hartenstein.

© 2007, [email protected] http://hartenstein.de75

TU Kaiserslautern

established R&D area

illustrating dynamically reconfigurable

time

FPGA

module no. macroZ

E phmodule z C ph

E ph

macro X

E phmodule X C ph

E ph C ph

configware macro Y

C ph

E phmodule Y C phX

conf

igur

es Y

X co

nfig

ures

Y

Swapping and scheduling of relocatable configware code macros is managed by a configware operating system

Swapping and scheduling of relocatable configware code macros is managed by a configware operating system

partially reconfigurableconfigware OS fundamentally different from software OS

Reconfigurable

Computing at

Microsoft

Microsoft

ReconVista

?Microsoft

ReconVista

?

Configware OS

Page 76: Reconfigurable Computing and the von Neumann Syndrome Reiner Hartenstein.

© 2007, [email protected] http://hartenstein.de76

TU KaiserslauternGliederung

• The von Neumann Paradigm• Accelerators and FPGAs• The Reconfigurable Computing Paradox• The new Paradigm• Coarse-grained• Bridging the Paradigm Chasm• Conclusions

Page 77: Reconfigurable Computing and the von Neumann Syndrome Reiner Hartenstein.

© 2007, [email protected] http://hartenstein.de77

TU KaiserslauternReconfigurable HPC

• This area is almost 10 years old

Page 78: Reconfigurable Computing and the von Neumann Syndrome Reiner Hartenstein.

© 2007, [email protected] http://hartenstein.de78

TU KaiserslauternReconfigurable HPC

• This area is almost 10 years old

Page 79: Reconfigurable Computing and the von Neumann Syndrome Reiner Hartenstein.

© 2007, [email protected] http://hartenstein.de79

TU KaiserslauternHave to re-think basic assumptions

Instead of physical limits, fundamental misconceptions of algorithmic complexity theory limit the progress and will necessitate new breakthroughs.

Not processing is costly, but moving data and messages

We’ve to re-think basic assumptions behind computing

Page 80: Reconfigurable Computing and the von Neumann Syndrome Reiner Hartenstein.

© 2007, [email protected] http://hartenstein.de80

TU Kaiserslautern

Illustrating the von Neumann paradigm trap

The data-stream-based approach

The instruction-stream-based approach

von Neuman

n bottle-

neck

von Neuman

n bottle-

neck

has no von Neumann bottle-neck

has no von Neumann bottle-neck

the watering pot model [Hartenstein]

many watering pots

Page 81: Reconfigurable Computing and the von Neumann Syndrome Reiner Hartenstein.

© 2007, [email protected] http://hartenstein.de81

TU KaiserslauternHave to re-think basic assumptions

Instead of physical limits, fundamental misconceptions of algorithmic complexity theory limit the progress and will necessitate new breakthroughs.

Not processing is costly, but moving data and messages

We’ve to re-think basic assumptions behind computing

Page 82: Reconfigurable Computing and the von Neumann Syndrome Reiner Hartenstein.

© 2007, [email protected] http://hartenstein.de82

TU KaiserslauternOutline

• The (non-v-N) anti-machine (Xputer)• Speed-up by address generators• Data-procedural Programming Language• Generalization of the Systolic Array• Partitioning Compilation Techniques• Design Space Exploration• Bridging the Paradigm Chasm

Page 83: Reconfigurable Computing and the von Neumann Syndrome Reiner Hartenstein.

© 2007, [email protected] http://hartenstein.de83

TU Kaiserslautern

More compute power by Configware than Software

Conclusion: most compute power from ConfigwareConclusion: most compute power from Configware

75% of all (micro)processors are embedded 4 : 1

avarage acceleration factor >2-> rMIPS* : MIPS > 2

*) rMIPS: MIPS replaced by FPGA compute power

25% embedded µProc. accelerated by FPGA(s)

1 : 4

(a very cautious estimation**)

-> 1 : 1-> Every 2nd µProc accelerated by FPGA(s)

(difference probably an order of magnitude)(difference probably an order of magnitude)

Page 84: Reconfigurable Computing and the von Neumann Syndrome Reiner Hartenstein.

© 2007, [email protected] http://hartenstein.de84

TU KaiserslauternXputer Lab (around 1990)

Page 85: Reconfigurable Computing and the von Neumann Syndrome Reiner Hartenstein.

© 2007, [email protected] http://hartenstein.de85

TU Kaiserslautern

anti

Page 86: Reconfigurable Computing and the von Neumann Syndrome Reiner Hartenstein.

© 2007, [email protected] http://hartenstein.de86

TU Kaiserslautern

Programming Language Paradigms

language category Computer Languages Xputer Languages

both deterministic procedural sequencing: traceable, checkpointable

operationsequencedriven by:

read next instruction, goto (instr. addr.),

jump (to instr. addr.), instr. loop, loop nesting

no parallel loops, escapes,instruction stream branching

read next data item, goto (data addr.),

jump (to data addr.),data loop, loop nesting,parallel loops, escapes,data stream branching

state register program counter data counter(s)addresscomputation

massive memorycycle overhead overhead avoided

Instruction fetch memory cycle overhead overhead avoidedparallel memorybank access interleaving only no restrictions

very easy to learn

multipleGAGs

Principles of MoPL [1994]

Principles of MoPL [1994]

Page 87: Reconfigurable Computing and the von Neumann Syndrome Reiner Hartenstein.

© 2007, [email protected] http://hartenstein.de87

TU Kaiserslautern

„It is feared that domain scientists will have to learn how to design hardware. Can we avoid the need for hardware design skills and understanding?“

Avoiding the paradigm shift?

Tarek El-Ghazawi, panelist at SuperComputing 2006

„A leap too far for the existing HPC community“panelist Allan J. Cantle

SuperComputing, Nov 11-17, 2006, Tampa, Florida, over 7000 registered attendees, and 274 exhibitors

We need a bridge strategy by developing advanced tools for training the software community to think in fine grained parallelism and pipelining techniques. A shorter leap by coarse-grained platforms which allow a software-like pipelining perspective

Page 88: Reconfigurable Computing and the von Neumann Syndrome Reiner Hartenstein.

© 2007, [email protected] http://hartenstein.de88

TU KaiserslauternOutline

• The von Neumann Paradigm• Accelerators and FPGAs• The Reconfigurable Computing Paradox• The new Paradigm• Coarse-grained• Bridging the Paradigm Chasm• Conclusions

Page 89: Reconfigurable Computing and the von Neumann Syndrome Reiner Hartenstein.

© 2007, [email protected] http://hartenstein.de89

TU Kaiserslautern

We need a new machine paradigm

a programmer does not understand function evaluation without machine mechanisms - without a pogram counter …

data-stream-based mind set

we urgently need a datastream based machine paradigm

we urgently need a datastream based machine paradigm

datadata

it was pepared almost 30 years ago

xxx

xxx

xxx

|

||

x xxx

xx

x xx

- --

xxxx

xx

xxx

---

---

---

---

xxx

xxx

xxx

|

|

|

|

|

|

|

|

|

|

|

|

|

| data streams

Page 90: Reconfigurable Computing and the von Neumann Syndrome Reiner Hartenstein.

© 2007, [email protected] http://hartenstein.de90

TU Kaiserslautern

Generic Address Generator GAG

Generalization of the DMA

datacounter

GAG

GAG & enabling technology published 1989, survey: [M. Herz et al.: IEEE ICECS 2003,

Dubrovnik]patented by TI 1995

• storge scheme optimization methodology, etc.

Acceleration factors by:

• address computation without memory cycles

avoid e.g. 94% address

computation overhead*

*) Software to Xputer migration

Reiner Hartenstein
ASM means: no instruction streams neededfor address computationGeneralization of DMAM. Herz et al.: ICECS 2003, Dubrovnik
Page 91: Reconfigurable Computing and the von Neumann Syndrome Reiner Hartenstein.

© 2007, [email protected] http://hartenstein.de91

TU KaiserslauternThe 2nd “archetype” machine

model

compilestructural

personalization

Configware IndustryConfigware Industry

Configware Industry’sSecret of Success

personalization:RAM-based

data-stream- based mind set

“Kress-Kung”

accelerator reconfigurable

simple basic .Machine Paradigm

Page 92: Reconfigurable Computing and the von Neumann Syndrome Reiner Hartenstein.

© 2007, [email protected] http://hartenstein.de92

TU KaiserslauternOutline

• The von Neumann Paradigm• Accelerators and FPGAs• The Reconfigurable Computing

Paradox• The new Paradigm• Coarse-grained• Bridging the Paradigm Chasm• Conclusions

Page 93: Reconfigurable Computing and the von Neumann Syndrome Reiner Hartenstein.

© 2007, [email protected] http://hartenstein.de93

TU Kaiserslautern

rDPU not used used for routing only operator and routing port location markerLegend: backbus connect

array size: 10 x 16 = 160 rDPUs

rout thru only

not usedbackbus connect

SNN filter on (supersystolic) KressArray (mainly a pipe network)

reconfigurable Data Path Unit, e. g. 32 bits wide

reconfigurable Data Path Unit, e. g. 32 bits wide

no CPUrDPUrDPU

question after the talk: „but you can‘t implement decisions!“

note: software perspective without instruction streams

Symptom of the von Neumann Syndrome

A High level R&D manager of a large Japanese IT industry groupyielded by single-paradigm mind set Executive summary? Forget it !How about a microprocessor giant having >100 vice presidents ?if clause turns into multiplexer

Page 94: Reconfigurable Computing and the von Neumann Syndrome Reiner Hartenstein.

© 2007, [email protected] http://hartenstein.de94

TU Kaiserslautern

rDPUrDPU rDPUrDPU rDPUrDPU rDPUrDPU

rDPUrDPU rDPUrDPU rDPUrDPU rDPUrDPU

rDPUrDPU rDPUrDPU rDPUrDPU rDPUrDPU

rDPUrDPU rDPUrDPU rDPUrDPU rDPUrDPUCPU

Dual Paradigm Application Development

SWcompiler

CWcompiler

C language source

Partitioner

Juergen Becker’s CoDe-X, 1996

placement and routingplacement and routing

automatic parallelization by loop transformationsautomatic parallelization by loop transformations

generating a pipe networkgenerating a pipe network

Page 95: Reconfigurable Computing and the von Neumann Syndrome Reiner Hartenstein.

© 2007, [email protected] http://hartenstein.de95

TU KaiserslauternHybrid Multi Core example

twin paradigm machine

each core can run CPU mode

or rDPU mode

rDPUrDPU rDPUrDPU rDPUrDPU

rDPUrDPU rDPUrDPU rDPUrDPU rDPUrDPU

rDPUrDPU rDPUrDPU rDPUrDPU rDPUrDPU

rDPUrDPU rDPUrDPU rDPUrDPU rDPUrDPU

rDPUrDPU rDPUrDPU rDPUrDPU

rDPUrDPU rDPUrDPU rDPUrDPU rDPUrDPU

rDPUrDPU rDPUrDPU rDPUrDPU rDPUrDPU

rDPUrDPU rDPUrDPU rDPUrDPU rDPUrDPU

rDPUrDPU rDPUrDPU rDPUrDPU rDPUrDPU

rDPUrDPU rDPUrDPU rDPUrDPU rDPUrDPU

rDPUrDPU rDPUrDPU rDPUrDPU rDPUrDPU

rDPUrDPU rDPUrDPU rDPUrDPU

rDPUrDPU rDPUrDPU rDPUrDPU rDPUrDPU

rDPUrDPU rDPUrDPU rDPUrDPU rDPUrDPU

rDPUrDPU rDPUrDPU rDPUrDPU rDPUrDPU

rDPUrDPU rDPUrDPU rDPUrDPU

CPUCPU

CPUCPU CPUCPU

CPUCPU

CPUCPU CPUCPU

CPUCPU CPUCPU

64 cores

How about a microprocessor giant having >100 vice presidents ?

How about a microprocessor giant having >100 vice presidents ?

customer refuses the pradigm shift?

customer refuses the pradigm shift?

disabled for the paradigm shift ?

disabled for the paradigm shift ?

Page 96: Reconfigurable Computing and the von Neumann Syndrome Reiner Hartenstein.

© 2007, [email protected] http://hartenstein.de96

TU Kaiserslautern

rDPUrDPU

rDPUrDPU

rDPUrDPU

rDPUrDPU

rDPUrDPU

rDPUrDPU

rDPUrDPU

rDPUrDPU

rDPUrDPU

rDPUrDPU

rDPUrDPU

rDPUrDPU

rDPUrDPU

rDPUrDPU

rDPUrDPU

CPUCPU

rDPUrDPU

rDPUrDPU

rDPUrDPU

CPUCPU

Compilation for Dual Paradigm Multicore

SWcompiler

CWcompiler

C language source

Partitioner

Juergen Becker’s CoDe-X, 1996

compile to hybrid multicorecompile to hybrid multicore

placement and routingplacement and routing

automatic parallelization by loop transformationsautomatic parallelization by loop transformations

generating a pipe networkgenerating a pipe network

Page 97: Reconfigurable Computing and the von Neumann Syndrome Reiner Hartenstein.

© 2007, [email protected] http://hartenstein.de97

TU KaiserslauternOutline

• The von Neumann Paradigm• Accelerators and FPGAs• The Reconfigurable Computing Paradox• The new Paradigm• Coarse-grained• Bridging the Paradigm Chasm• Conclusions

Page 98: Reconfigurable Computing and the von Neumann Syndrome Reiner Hartenstein.

© 2007, [email protected] http://hartenstein.de98

TU Kaiserslautern

Here is the common model

programcounter

DPUCPU

RAMmemory

von Neumann bottleneck

von Neumann instruction-

stream-based machine

co-processors

acceleratorCPU

instruction-stream-based

data-stream-

based

hard

ware

software

mainframe age:

microprocessor age:

configware age:

CPU accelerator reconfigurable

software/configwareco-compiler

software configware accelerator reconfigurable

accelerator hardwired

CPU

Page 99: Reconfigurable Computing and the von Neumann Syndrome Reiner Hartenstein.

© 2007, [email protected] http://hartenstein.de99

TU KaiserslauternOutline

• The von Neumann Paradigm• Accelerators and FPGAs• The Reconfigurable Computing Paradox• The new Paradigm• Coarse-grained• Bridging the Paradigm Chasm• Conclusions

Page 100: Reconfigurable Computing and the von Neumann Syndrome Reiner Hartenstein.

© 2007, [email protected] http://hartenstein.de100

TU KaiserslauternMulti Core: Just more CPUs ?

Complexity and clock frequency of single-core microprocessors come to an end

Without a paradigm shift just more CPUs on chip lead to the dead roads known from supercomputing

Multi-core microprocessor chips emerging: soon 32 cores on an AMD chip, and 80 on an intel

Multi-threading is not the silver bullet

We’ve to re-think basic assumptions behind computing

Page 101: Reconfigurable Computing and the von Neumann Syndrome Reiner Hartenstein.

© 2007, [email protected] http://hartenstein.de101

TU KaiserslauternSolution not expected from CS officers

We need mutual efforts, like EE/CS cooperation known from the Mead & Conway revolution

Progress of the joint task force on CS curriculum recommendations is extremely disillusioning

For RC other motivations are similarly high-grade: growing cost and looming shortage of energy.

The personal supercomputer: a far-ranging massive push of innovation in all areas of science and economy:

by Reconfigurable Computing

it‘s more like a lobby: „my area is the most important“

Page 102: Reconfigurable Computing and the von Neumann Syndrome Reiner Hartenstein.

© 2007, [email protected] http://hartenstein.de102

TU Kaiserslautern

Computing Sciences are in a severe Crisis

We urgently need to shape the Reconfigurable Computing Revolution for enabling to go toward incredibly promising new horizons of affordable highest performance computing

This cannot be achieved with the classical software-based mind set

We need a new dual paradigm approach

Watch out not to get screwed !

Supercomputing titans may be your enemies

Page 103: Reconfigurable Computing and the von Neumann Syndrome Reiner Hartenstein.

© 2007, [email protected] http://hartenstein.de103

TU KaiserslauternThe Configware Age

• Mainframe age and microprocessor(-only) age are history

• We are living in the configware age right now!

• Attempts to avoid the paradigm shift will again create a disaster

Page 104: Reconfigurable Computing and the von Neumann Syndrome Reiner Hartenstein.

© 2007, [email protected] http://hartenstein.de104

TU Kaiserslautern

thank you for your patience

Page 105: Reconfigurable Computing and the von Neumann Syndrome Reiner Hartenstein.

© 2007, [email protected] http://hartenstein.de105

TU Kaiserslautern

overhead

Page 106: Reconfigurable Computing and the von Neumann Syndrome Reiner Hartenstein.

© 2007, [email protected] http://hartenstein.de106

TU Kaiserslautern

Page 107: Reconfigurable Computing and the von Neumann Syndrome Reiner Hartenstein.

© 2007, [email protected] http://hartenstein.de107

TU Kaiserslautern

Von Neumann vs. anti machine

# feature von Neumann machine

hardwired anti machine

reconfigurable anti machine

1 m’ code schedules: instruction stream data streams

2 # prog’ sources 1 2

3 source 1 none configware

4 source 2 software flowware

5 sequenced by: program counter data counters

6 counter co-located with: PU (data path): CPU memory block: ASM

9 inter PU communication: common memory piped through

10 data meeting PU: move data at run time move locality of execution at compile rime

Page 108: Reconfigurable Computing and the von Neumann Syndrome Reiner Hartenstein.

© 2007, [email protected] http://hartenstein.de108

TU Kaiserslautern

Overhead avoided by anti machine

# feature von Neumann machine

hardwired anti machine

reconfigurable anti machine

11 state address computation overhead at run time

instruction stream none

12 data address computation overhead at run time

instruction stream none

13 Inter PU communication overhead at run time

instruction stream none

14 instruction fetch at run time instruction stream none

15 data meet PU at run time instruction stream none

Page 109: Reconfigurable Computing and the von Neumann Syndrome Reiner Hartenstein.

© 2007, [email protected] http://hartenstein.de109

TU Kaiserslautern

GAG

Page 110: Reconfigurable Computing and the von Neumann Syndrome Reiner Hartenstein.

© 2007, [email protected] http://hartenstein.de110

TU Kaiserslautern

MoM Scan window (MoMSW) Illustration

• Multiple* vari-size reconfigurable MoMSW scan windows

• MoMSW controlled by reconfigurable GAG (generic address generators)

• 2-dimensional (data) memory address space

MoM architectural primary features:

*) typically 3

ASM: Auto-Sequencing

Memory

ASM: Auto-Sequencing

Memory

ASM: Auto-Sequencing

Memory

ASM: Auto-Sequencing

Memory

Page 111: Reconfigurable Computing and the von Neumann Syndrome Reiner Hartenstein.

© 2007, [email protected] http://hartenstein.de111

TU Kaiserslautern

CGFFT: Parallel Scan Pattern Animation

MoM-3 with 3 varisize scan windows

DatapathDatapathASM: Auto-

Sequencing Memory

ASM: Auto-Sequencing

Memory

Page 112: Reconfigurable Computing and the von Neumann Syndrome Reiner Hartenstein.

© 2007, [email protected] http://hartenstein.de112

TU Kaiserslautern

Reconfigurable Generic Address Generator GAG

Generalization of the DMA

datacounter

GAG

GAG & enabling technology published 1989, survey: [M. Herz et al.: IEEE ICECS 2003,

Dubrovnik]patented by TI 1995

• storge scheme optimization methodology, etc.

Acceleration factors by:

• address computation without memory cycles

avoid e.g. 94% address

computation overhead*

• supporting scratch optimization strategies (smart d-caching)

Reiner Hartenstein
ASM means: no instruction streams neededfor address computationGeneralization of DMAM. Herz et al.: ICECS 2003, Dubrovnik
Page 113: Reconfigurable Computing and the von Neumann Syndrome Reiner Hartenstein.

© 2007, [email protected] http://hartenstein.de113

TU Kaiserslautern

GAG: 2-D Generic Data Sequence Examples

a) b)

c)

d) e) f) g)

Page 114: Reconfigurable Computing and the von Neumann Syndrome Reiner Hartenstein.

© 2007, [email protected] http://hartenstein.de114

TU KaiserslauternGAG Slider Operation Demo

Example

yx

ceiling

C

address

LB

L0B0AF

floor

LB

Page 115: Reconfigurable Computing and the von Neumann Syndrome Reiner Hartenstein.

© 2007, [email protected] http://hartenstein.de115

TU Kaiserslautern

XMDS Scan Pattern Editor GUI

Page 116: Reconfigurable Computing and the von Neumann Syndrome Reiner Hartenstein.

© 2007, [email protected] http://hartenstein.de116

TU Kaiserslautern

JPEG zigzag scan pattern

x

y

EastScan is step by [1,0]end EastScan;

SouthScan isstep by [0,1]endSouthScan;

*> Declarations

NorthEastScan isloop 8 times until [*,1]step by [1,-1]endloopend NorthEastScan;

SouthWestScan isloop 8 times until [1,*]step by [-1,1]endloopend SouthWestScan;

HalfZigZag isEastScanloop 3 times SouthWestScanSouthScanNorthEastScanEastScanendloopend HalfZigZag;

goto PixMap[1,1]

HalfZigZag;SouthWestScanuturn (HalfZigZag)

HalfZigZag

data counterdata counter

data counterdata counter

2

1

3

4

HalfZigZag

Page 117: Reconfigurable Computing and the von Neumann Syndrome Reiner Hartenstein.

© 2007, [email protected] http://hartenstein.de117

TU Kaiserslautern

Significance of MoMSW Reconfigurable Scan Windows

• MoMSW Scan windows have the potential to drastically reduce traffic to/from slow off-chip memory.

• No instruction streams needed to implement scratch pad optimization strategies using fast on-chip memory

• MoMSW Scan windows may contribute to speed-up by a factor of 10 and sometimes even much more

• MoMSW Scan windows are the deterministic alternative („d-caching“) to (indeterministic and speculative) classical cache usage: performance can be well predicted

• For data-stream-based computing scan windows are highly effective, whereas classical caches are entirely useless

Page 118: Reconfigurable Computing and the von Neumann Syndrome Reiner Hartenstein.

© 2007, [email protected] http://hartenstein.de118

TU Kaiserslautern

Linear Filter Application

after inner scan line loop unrolling

final design

after scan line

unrolling

hardw. level access optim.

initial design

Parallelized Merged Buffer Linear Filter Applicationwith example image of x=22 by y=11 pixel

Speed-up factor >11due to MoMSW-based d-caching & storage scheme optimization

Page 119: Reconfigurable Computing and the von Neumann Syndrome Reiner Hartenstein.

© 2007, [email protected] http://hartenstein.de119

TU Kaiserslautern

PISA-MoM

Page 120: Reconfigurable Computing and the von Neumann Syndrome Reiner Hartenstein.

© 2007, [email protected] http://hartenstein.de120

TU Kaiserslautern

Processing 4-by-4 Reference Patterns

Mead-&-Conway nMOS Design Rules:256 4-by-4 reference patterns

Mead-&-Conway CMOS Design Rules:>800 4-by-4 reference patterns

MoM: all reference patterns matched in a single clock cycle

vN Software: some reference patterns can be skipped, depending on earlier patterns

DPLA: fabricated by the E.I.S. Multi University Project:

PISA DRC accelerator [ICCAD 1984]

1984: 1 DPLA replaces 256 FPGAsReference patterns automatically

generated from Design Rules

PISA: a forerunner of the MoM

accelerator reconfigurable

Page 121: Reconfigurable Computing and the von Neumann Syndrome Reiner Hartenstein.

© 2007, [email protected] http://hartenstein.de121

TU Kaiserslautern

Speed-up by MoM-1 compared to 68020PISA project

Page 122: Reconfigurable Computing and the von Neumann Syndrome Reiner Hartenstein.

© 2007, [email protected] http://hartenstein.de122

TU Kaiserslautern

Speed-up by MoM-3 compared to SPARC 10/51

Page 123: Reconfigurable Computing and the von Neumann Syndrome Reiner Hartenstein.

© 2007, [email protected] http://hartenstein.de123

TU Kaiserslautern

1985 – 1990: Multimedia & DSP: MoM-3 speedup

Page 124: Reconfigurable Computing and the von Neumann Syndrome Reiner Hartenstein.

© 2007, [email protected] http://hartenstein.de124

TU KaiserslauternOutline

• The (non-v-N) anti-machine (Xputer)• Speed-up by address generators• Data-procedural Programming Language• Generalization of the Systolic Array• Partitioning Compilation Techniques• Design Space Exploration• Bridging the Paradigm Chasm

Page 125: Reconfigurable Computing and the von Neumann Syndrome Reiner Hartenstein.

© 2007, [email protected] http://hartenstein.de125

TU Kaiserslautern

Significance of Address Generators

• Address generators have the potential to reduce computation time significantly.

• In a grid-based design rule check a speed-up of more than 2000 has been achieved*

• reconfigured address generators contributed a factor of 10 - avoiding memory cycles for address computation overhead

*) 15,000 if the same algorithm is used

Page 126: Reconfigurable Computing and the von Neumann Syndrome Reiner Hartenstein.

© 2007, [email protected] http://hartenstein.de126

TU Kaiserslautern

hardware vs. software perspective

platform hardware perspective

data-stream-driven

software perspective

instruction-stream-driven

flexibility

performance

pot.

1

single paradigm

simple FPGA** X X +++ ++

2µprocessor &

multi core X X +++ -

3 coarse-grained X X ++ +++

4Platform FPGA 1 & (2)* & 3 X X X (X)* ++

+ ++++

5

dual paradigm

1 & 2 X X X X ++ ++

6 2 & 4 X X X X +++ ++++

7 2 & 3 X X X + +++

8reconfigurable

instr. set X X X +++ +

*) with soft cores and/or on-chip microprocessor**) without soft cores

for software peoplefor software people

for software peoplefor software people

Page 127: Reconfigurable Computing and the von Neumann Syndrome Reiner Hartenstein.

© 2007, [email protected] http://hartenstein.de127

TU Kaiserslautern

IngredientsrLB Soft

CPUsimple FPGA

rDPU

BRAMCPU

platform FPGA

rLB

hardwired special

functions

SoftCPU

rDPU BRAM

coarse-grained array

RAM

CPUand, for runninglegacy

softwarerDPU BRAM

anti machine (Xputer)

ASMASMSoftCPU

programcounter

CPU

programcounter

datacounter

ASMASMASMCPU rDPU

CPU with reconfigurable instruction set extension

rLB

(Kress/Kung machine)

all multi core!

all multi core!on-chip

Page 128: Reconfigurable Computing and the von Neumann Syndrome Reiner Hartenstein.

© 2007, [email protected] http://hartenstein.de128

TU Kaiserslautern

perspective ? what expertise needed ? hardware ?

• microprocessor (also multi core)

• simple FPGA (fine-grained)

• platform FPGA (domain-specific core assortment, embedded in FPGA fabrics)

• coarse-grained reconfigurable array

• reconfigurable instruction set processor

mishmash model – a

nightmare for under-

graduate studies

but by far best

optimization potential

software perspective

von Neumann:

software perspective

hardware

perspective

mishmash model (s. a.)

Page 129: Reconfigurable Computing and the von Neumann Syndrome Reiner Hartenstein.

© 2007, [email protected] http://hartenstein.de129

TU Kaiserslautern

flexibility (for accelerators)

Objectives

avoiding specific silicon

rapid prototyping, field-patching, emulation

cheap, compact vHPC

for every area which needs:

Page 130: Reconfigurable Computing and the von Neumann Syndrome Reiner Hartenstein.

© 2007, [email protected] http://hartenstein.de130

TU Kaiserslautern Reconfigurable Computing opens many spectacular new horizons:

Conclusion (1)

Cheap vHPC without needing specific silicon, no mask ....

Massive reduction of the electricity bill: locally and national

Cheap embedded vHPC Cheap desktop supercomputer (a new market)

Fast and cheap prototyping

Replacing expensive hardwired accelerators

Supporting fault tolerance, self-repair and self-organization

Flexibility for systems with unstable multiple standards by dynamic reconfigurability

Emulation logistics for very long term sparepart provision and part type count reduction (automotive, aerospace …)

Page 131: Reconfigurable Computing and the von Neumann Syndrome Reiner Hartenstein.

© 2007, [email protected] http://hartenstein.de131

TU Kaiserslautern

Universal vHPC co-architecture demonstrator

Conclusion (2)Needed:

The compilation tool problem to be solvedLanguage selection problem to be solvedEducation backlog problems to be solved

Use this to develop a very good high school and undergraduate lab course

A motivator: preparing for the top 500 contest

For widely spreading its use successfully:

select killer applications for demo

Page 132: Reconfigurable Computing and the von Neumann Syndrome Reiner Hartenstein.

© 2007, [email protected] http://hartenstein.de132

TU Kaiserslautern

More compute power by Configware than Software

Conclusion: most compute power from ConfigwareConclusion: most compute power from Configware

75% of all (micro)processors are embedded 4 : 1

avarage acceleration factor >2-> rMIPS* : MIPS > 2

*) rMIPS: MIPS replaced by FPGA compute power

25% embedded µProc. accelerated by FPGA(s)

1 : 4

(a very cautious estimation**)

-> 1 : 1-> Every 2nd µProc accelerated by FPGA(s)

(difference probably an order of magnitude)(difference probably an order of magnitude)

Page 133: Reconfigurable Computing and the von Neumann Syndrome Reiner Hartenstein.

© 2007, [email protected] http://hartenstein.de133

TU KaiserslauternConclusion (3)

Self-Repair and Self-Organization methodologyEmbedded r-emulation logistics methodology

Universal vHPC co-architecture demonstrator

select a killer application for demo

For widely spreading its use successfully:

Page 134: Reconfigurable Computing and the von Neumann Syndrome Reiner Hartenstein.

© 2007, [email protected] http://hartenstein.de134

TU Kaiserslautern

Universal HPC co-architecture for:

some Goals

embedded vHPC (nomadic, automotive, ...)desktop vHPC (scientific computing ...)

Application co-development environment forHardware non-experts, ....Acceptability by software-type users, ...

Meet product lifetime >> embedded syst. life:FPGA emulation logistics from

development downto maintenance and repair stationsexamples: automotive, aerospace,

industrial, ..

Page 135: Reconfigurable Computing and the von Neumann Syndrome Reiner Hartenstein.

© 2007, [email protected] http://hartenstein.de135

TU Kaiserslautern

SuperComputing 06SuperComputing, Nov 11-17, 2006, Tampa, Florida, over 7000 registered attendees, and 274 exhibitors

Is High-Performance Reconfigurable Computing the Next Supercomputing Paradigm?Tarek El-Ghazawi, The George Washington University-

Is High-Performance Reconfigurable Computing the Next Supercomputing Paradigm?Dave Bennett, Xilinx, Inc-

Reconfigurable Computing: The Future of HPCDaniel S. Poznanovic, SRC Computers, Inc.-

Is High-Performance Reconfigurable Computing the Next Supercomputing Paradigm?Allan J. Cantle , Nallatech Ltd.-

Challenges for Reconfigurable Computing in HPCKeith D. Underwood, Sandia National Laboratories-

Reconfigurable Computing - Are We There Yet?Rob Pennington, National Center for Supercomputing Applications-

Reconfigurable Computing: The Road AheadDuncan Buell, University of South Carolina-

Opportunities and Challenges with Reconfigurable HPCAlan D. George, University of Florida

Panel

Page 136: Reconfigurable Computing and the von Neumann Syndrome Reiner Hartenstein.

© 2007, [email protected] http://hartenstein.de136

TU KaiserslauternOutline

• The (non-v-N) anti-machine (Xputer)• Speed-up by address generators• Data-procedural Programming Language• Generalization of the Systolic Array• Partitioning Compilation Techniques• Design Space Exploration• Bridging the Paradigm Chasm

Page 137: Reconfigurable Computing and the von Neumann Syndrome Reiner Hartenstein.

© 2007, [email protected] http://hartenstein.de137

TU KaiserslauternOutline

• The (non-v-N) anti-machine (Xputer)• Speed-up by address generators• Data-procedural Programming Language• Generalization of the Systolic Array• Partitioning Compilation Techniques• Design Space Exploration• Bridging the Paradigm Chasm

Page 138: Reconfigurable Computing and the von Neumann Syndrome Reiner Hartenstein.

© 2007, [email protected] http://hartenstein.de138

TU Kaiserslautern

Acceleration Mechanisms by ASM-based MoMSW

•parallelism by multi bank memory architecture•reconfigurable address compuattion – before run time

•avoiding multiple accesses to the same data.•avoiding memory cycles for address computation•improve parallelism by storage scheme transformations

•minimize data movement across chip boundaries

Page 139: Reconfigurable Computing and the von Neumann Syndrome Reiner Hartenstein.

© 2007, [email protected] http://hartenstein.de139

TU KaiserslauternOutline

• The (non-v-N) anti-machine (Xputer)• Speed-up by address generators• Data-procedural Programming Language• Generalization of the Systolic Array• Partitioning Compilation Techniques• Design Space Exploration• Bridging the Paradigm Chasm

Page 140: Reconfigurable Computing and the von Neumann Syndrome Reiner Hartenstein.

© 2007, [email protected] http://hartenstein.de140

TU KaiserslauternOutline

• The (non-v-N) anti-machine (Xputer)• Speed-up by address generators• Data-procedural Programming Language• Generalization of the Systolic Array• Partitioning Compilation Techniques• Design Space Exploration• Bridging the Paradigm Chasm

Page 141: Reconfigurable Computing and the von Neumann Syndrome Reiner Hartenstein.

© 2007, [email protected] http://hartenstein.de141

TU KaiserslauternOutline

• The (non-v-N) anti-machine (Xputer)• Speed-up by address generators• Data-procedural Programming Language• Generalization of the Systolic Array• Partitioning Compilation Techniques• Design Space Exploration• Bridging the Paradigm Chasm

Page 142: Reconfigurable Computing and the von Neumann Syndrome Reiner Hartenstein.

© 2007, [email protected] http://hartenstein.de142

TU KaiserslauternC or FORTRAN ?

Computer scientists haven’t been interested in programming clusters. If putting the cluster on a chip is what excites them, fine.

Gordon Bell:

It will still have to run Fortran!

*) like CoDe-X

Support tools have been demonstrated by academia

Classical programming languages, but with a slightly different semantics (data-procedural) are good candidates for parallel programming.

Reiner Hartenstein (conclusion of this talk):

or C (X-C)

it’s a shorter leapit’s a shorter leap

Page 143: Reconfigurable Computing and the von Neumann Syndrome Reiner Hartenstein.

© 2007, [email protected] http://hartenstein.de143

TU KaiserslauternNewton’s 1st Law

Scientists do not change their direction

Newton’s 1st Law à la Gordon Bell:

##

*) like CoDe-X

###

#####

###

##’##’

a

Page 144: Reconfigurable Computing and the von Neumann Syndrome Reiner Hartenstein.

© 2007, [email protected] http://hartenstein.de144

TU Kaiserslautern

Edu defic

Page 145: Reconfigurable Computing and the von Neumann Syndrome Reiner Hartenstein.

© 2007, [email protected] http://hartenstein.de145

TU KaiserslauternDual paradigm: an old hat

Mapped into a Hardware mind set: action box = Flipflop, decision box = (de)multiplexer

Software mind set: instruction-stream-based: flow chart -> control instructions(FSM: state transition)

-> Register Transfer Modules (DEC: mid 1970ies); similar concept: Case Western Reserve Univ. ;

FF

token bitevoke

Page 146: Reconfigurable Computing and the von Neumann Syndrome Reiner Hartenstein.

© 2007, [email protected] http://hartenstein.de146

TU KaiserslauternDual paradigm: an old hat

(2)

“It is so simple!

why did it take 25 years to find out ?”

Hardware Description Language scene ~1970:

Because of the reductionists’ tunnel view

Because of a lack of transdisciplinary thinking

FF

token bitevoke

Page 147: Reconfigurable Computing and the von Neumann Syndrome Reiner Hartenstein.

© 2007, [email protected] http://hartenstein.de147

TU KaiserslauternDual paradigm: an old hat

(3)

“procedure call” or function call

call Module-name (parameters);Software: time domain

Hardware Description Languages;

Hardware description: space domain

Page 148: Reconfigurable Computing and the von Neumann Syndrome Reiner Hartenstein.

© 2007, [email protected] http://hartenstein.de148

TU Kaiserslautern

ASM

Page 149: Reconfigurable Computing and the von Neumann Syndrome Reiner Hartenstein.

© 2007, [email protected] http://hartenstein.de149

TU Kaiserslautern

Page 150: Reconfigurable Computing and the von Neumann Syndrome Reiner Hartenstein.

© 2007, [email protected] http://hartenstein.de150

TU Kaiserslautern

Co-comp

Page 151: Reconfigurable Computing and the von Neumann Syndrome Reiner Hartenstein.

© 2007, [email protected] http://hartenstein.de151

TU Kaiserslautern

Page 152: Reconfigurable Computing and the von Neumann Syndrome Reiner Hartenstein.

© 2007, [email protected] http://hartenstein.de152

TU Kaiserslautern

programcounter

DPUCPU

RAMmemory

von Neumann bottleneck

von Neumann instruction-

stream-based machine

co-processors

acceleratorCPU

instruction-stream-based

data-stream-

based

hard

ware

software

mainframe age:

microprocessor age:

configware age:

CPU accelerator reconfigurable

software/configwareco-compiler

software configware

rDPUrDPU rDPUrDPU rDPUrDPU rDPUrDPU

rDPUrDPU rDPUrDPU rDPUrDPU rDPUrDPU

rDPUrDPU rDPUrDPU rDPUrDPU rDPUrDPU

rDPUrDPU rDPUrDPU rDPUrDPU rDPUrDPUCPU

SWcompiler

CWcompiler

C language source

Partitioner

CoDe-X, 1996

Apropos HiPEAC: Software / Configware Co-Compilation

automatic parallelization by loop transformations

Page 153: Reconfigurable Computing and the von Neumann Syndrome Reiner Hartenstein.

© 2007, [email protected] http://hartenstein.de153

TU Kaiserslautern

Jürgen Becker’s CoDE-X -1 Co-Compiler

Analyzer/ Profiler

GNU Ccompiler

paradigm

Computer machine

X-Ccompiler

Anti machineparadigm

Partitioner

X-C is C languageextended by MoPLX-C

CPU XputerXputer& running

legacy software rALU: => array size: 1-by-1

Page 154: Reconfigurable Computing and the von Neumann Syndrome Reiner Hartenstein.

© 2007, [email protected] http://hartenstein.de154

TU Kaiserslautern

Jürgen Becker’s CoDE-X -2 Co-Compiler

Analyzer/ Profiler

GNU Ccompiler

paradigmComputer machine

DPSS

X-Ccompiler

Anti machineparadigm

Partitioner

X-C is C languageextended by MoPLX-C

rDPUrDPU rDPUrDPU rDPUrDPU rDPUrDPU

rDPUrDPU rDPUrDPU rDPUrDPU rDPUrDPU

rDPUrDPU rDPUrDPU rDPUrDPU rDPUrDPU

rDPUrDPU rDPUrDPU rDPUrDPU rDPUrDPUCPU

Resource Parameters

supportingKressArray

family

Pipelining: A Shorter LeapPipelining: A Shorter Leap

Page 155: Reconfigurable Computing and the von Neumann Syndrome Reiner Hartenstein.

© 2007, [email protected] http://hartenstein.de155

TU Kaiserslautern

Jürgen Becker’s CoDE-X -2 Co-Compiler

Analyzer/ Profiler

GNU Ccompiler

paradigmComputer machine

DPSS

X-Ccompiler

Anti machineparadigm

Partitioner

X-C is C languageextended by MoPLX-C

rDPUrDPU

rDPUrDPU

rDPUrDPU

rDPUrDPU

rDPUrDPU

rDPUrDPU

rDPUrDPU

rDPUrDPU

rDPUrDPU

rDPUrDPU

rDPUrDPU

rDPUrDPU

rDPUrDPU

rDPUrDPU

rDPUrDPU

CPUCPU

rDPUrDPU

rDPUrDPU

rDPUrDPU

CPUCPU

heterogenous multi-core by dual mode cores: CPU mode vs. rDPU mode

Page 156: Reconfigurable Computing and the von Neumann Syndrome Reiner Hartenstein.

© 2007, [email protected] http://hartenstein.de156

TU Kaiserslautern

Why better

Page 157: Reconfigurable Computing and the von Neumann Syndrome Reiner Hartenstein.

© 2007, [email protected] http://hartenstein.de157

TU Kaiserslautern

Page 158: Reconfigurable Computing and the von Neumann Syndrome Reiner Hartenstein.

© 2007, [email protected] http://hartenstein.de158

TU Kaiserslautern

hardware vs. software perspective

platform hardware perspective

data-stream-driven

software perspective

instruction-stream-driven

flexibility

performance

pot.

1

single paradigm

simple FPGA** X X +++ ++

2µprocessor &

multi core X X +++ -

3 coarse-grained X X ++ +++

4Platform FPGA 1 & (2)* & 3 X X X (X)* ++

+ ++++

5

dual paradigm

1 & 2 X X X X ++ ++

6 2 & 4 X X X X +++ ++++

7 2 & 3 X X X + +++

8reconfigurable

instr. set X X X +++ +

*) with soft cores and/or on-chip microprocessor**) without soft cores

for software peoplefor software people

for software peoplefor software people

Page 159: Reconfigurable Computing and the von Neumann Syndrome Reiner Hartenstein.

© 2007, [email protected] http://hartenstein.de159

TU Kaiserslautern

Data meeting the Processing Unit (PU)

by Software

byConfigware

routing the data by memory-cycle-hungry instruction streams thru shared memoryplacement of the execution locality ...

We have 2 choices

pipe network generated by configware compilation

... partly explaining the RC paradox

Page 160: Reconfigurable Computing and the von Neumann Syndrome Reiner Hartenstein.

© 2007, [email protected] http://hartenstein.de160

TU KaiserslauternData meeting the Processing Unit

byConfigware

placement of the execution locality ...

… pipe network generated by configware compilation

Page 161: Reconfigurable Computing and the von Neumann Syndrome Reiner Hartenstein.

© 2007, [email protected] http://hartenstein.de161

TU Kaiserslautern

conclus

Page 162: Reconfigurable Computing and the von Neumann Syndrome Reiner Hartenstein.

© 2007, [email protected] http://hartenstein.de162

TU Kaiserslautern

thank you for your patience

Page 163: Reconfigurable Computing and the von Neumann Syndrome Reiner Hartenstein.

© 2007, [email protected] http://hartenstein.de163

TU Kaiserslautern

END

Page 164: Reconfigurable Computing and the von Neumann Syndrome Reiner Hartenstein.

© 2007, [email protected] http://hartenstein.de164

TU Kaiserslautern

END

Page 165: Reconfigurable Computing and the von Neumann Syndrome Reiner Hartenstein.

© 2007, [email protected] http://hartenstein.de165

TU Kaiserslautern

„It is feared that domain scientists will have to learn how to design hardware. Can we avoid the need for hardware design skills and understanding?“

Avoiding the paradigm shift?

Tarek El-Ghazawi, panelist at SuperComputing 2006

„A leap too far for the existing HPC community“panelist Allan J. Cantle

SuperComputing, Nov 11-17, 2006, Tampa, Florida, over 7000 registered attendees, and 274 exhibitors

We need a bridge strategy by developing advanced tools for training the software community to think in fine grained parallelism and pipelining techniques. A shorter leapA shorter leap by coarse-grained platforms

which allow a software-like pipelining perspective

Page 166: Reconfigurable Computing and the von Neumann Syndrome Reiner Hartenstein.

© 2007, [email protected] http://hartenstein.de166

TU Kaiserslautern

•… the promise of almost unimagined computing power•have the hardware developers raced too far ahead of many programmers' ability to create software ?•parallel computing has been an esoteric skill limited to people involved with high-performance supercomputing. That is changing now that desktop computers and even laptops aregoing multicore.•"High-performance computing experts have learned to deal with this, but they are a fraction of the programmers," Saied says. “•In the future you won't be able to get a computer that's not multicore•multicore chips become ubiquitous, all programmers will have to learn new tricks."•Even in high-performance computing there are areas that aren't yet ready for the new multicore machines.•"In industry, much of their high-performance code is not parallel," Saied says. "These corporations have a lot of time and money invested in their software, and they are rightly worried about having to re-engineer that code base."

Avoiding the paradigm shift?„A leap too far for the existing HPC community“

We need a bridge strategy by developing advanced tools for training the software community to think in fine grained parallelism and pipelining techniques.

A shorter leapA shorter leap by coarse-grained platforms which allow a software-like pipelining perspective

Page 167: Reconfigurable Computing and the von Neumann Syndrome Reiner Hartenstein.

© 2007, [email protected] http://hartenstein.de167

TU Kaiserslautern

•"Moore's Gap." •Steve Kirsch, an engineering fellow for Raytheon Systems Co., says that multicore computing presents both the dream of infinite computing power and the nightmare of programming. •"The real lesson here is that the hardware and software industries have to pay attention to each other," Kirsch says. "Their futures are tied together in a way that they haven't been in recent memory, and that will change the way both businesses will operate."

Avoiding the paradigm shift?

February, Intel released research details about a chip with 80 cores, a fingernail sized chip that has the same processing power that in 1996 required a supercomputer with a 2,000-square-foot footprint and using 1,000 times the electrical power.

a problem for those who depend on previously written software that has been steadily improving and evolving over decades. "Our legacy software is a real concern to us.

parallel programming for multicore computers may require new computer languages. "Today we program in sequential languages

Do we need to express our algorithms at a higher level of abstraction? Research into these areas is critical to our success."

Page 168: Reconfigurable Computing and the von Neumann Syndrome Reiner Hartenstein.

© 2007, [email protected] http://hartenstein.de168

TU Kaiserslautern

•""Our programming languages researchers are exploring new programming paradigms and models," Hambrusch says. "Our course on multicore architectures is also preparing students for future software development positions. Purdue is clearly playing a defining role in this critical technology."

Avoiding the paradigm shift?

"In five or six years, laptop computers will have the same capabilities, and face the same obstacles, as today's supercomputers," Saied says. "This challenge will face people who program for desktop computers, too. People who think they have nothing to do with supercomputers and parallel processing will find out that they need these skills, too."

Remote Direct Memory Access (RDMA) is a technology that allows computers in a network to exchange data in main memory without involving the processor, cache, or operating system of either computer. Like locally-based Direct Memory Access (DMA), RDMA improves throughput and performance because it frees up resources. RDMA also facilitates a faster data transfer rate. RDMA implements a transport protocol in the network interface card (NIC) hardw

Page 169: Reconfigurable Computing and the von Neumann Syndrome Reiner Hartenstein.

© 2007, [email protected] http://hartenstein.de169

TU Kaiserslautern

•Three Ways to Make Multicore Work•-- Number 1:•-- Mathematics: Do more computational work with less data motion•– E.g., Higher-order methods•• Trades memory motion for more operations per word, producing an accurate answer in less elapsed time than lower order methods•– Different problem decompositions (no stratified solvers)•• The mathematical equivalent of loop fusion•• E.g., nonlinear Schwarz methods•– Ensemble calculations•• Compute ensemble values directly•– It is time (really past time) to rethink algorithms for memory locality and latency tolerance •I didn’t say threads•• See, e.g., Edward A. Lee, "The Problem with Threads," Computer, vol. 39, no. 5, pp. 33-42, May, 2006.•• “Night of the Living Threads”,•http://weblogs.mozillazine.org/roc/archives/2005/12/night_of_the_living_threads.html , 2005•• Robert O'Callahan: “Why Threads Are A Bad Idea (for most purposes)” John Ousterhout (~2004)••Allen Holub: “If I were king: A proposal for fixing the Java programming language's threading problems” http://www128.ibm.com/developerworks/library/j-king.html, 2000 Allen Holub has been working in the computer industry since 1979. He is widely published in magazines (Dr. Dobb's Journal, Programmers Journal, Byte, MSJ, among others), and he writes the "Java Toolbox" column for the online magazine JavaWorld .

Avoiding the paradigm shift?

Breaking the Assumptions-- Don’t have any off-chip memory– Consequence: Need algorithms, programming models, and software tools to work in more limited memory (a few GB)-- Have off-chip memory, but manage it more effectively– Consequence: Need to find a true, general-purpose hardware/software model-- Overlap latency with split operations– Consequence: Need to find massive amounts of concurrency; need to manage the programming challenges of split operations (these are hard for programmers to use correctly - may be an opportunity for formal methods) Multicore doesn’t just stress bandwidth, it increases the need for perfectly parallel algorithms-- All systems will look like attached processors - high latency, low (relative) bandwidth to main memory 128 cores? “When [a] request for data from Core 1 results in a L1 cache miss, the request is sent to the L2 cache. If this request hits a modified line in the L1 data cache of Core 2, certain internal conditions may cause incorrect data to be returned to the Core 1.” Everything does not double: traveling from New York to Chicago: before 1830: 3 weeks - 1857: 1+1/2 days; now: 6 hours - only a factor of 6 MPI on Multi-Core: 340 ns MPI ping/pong latency improvement will require better SWE tools Benchmarks• Ping-pong latency– Ring-based ping-pong exchange between all nodes• Nearest-neighbor ghost-area exchange– Test code from Argonne used to evaluate onesided and point-to-point operations• CPU availability– Calculates percentage of CPU available at receiver by doing a fixed amount of work during message arrival

Page 170: Reconfigurable Computing and the von Neumann Syndrome Reiner Hartenstein.

© 2007, [email protected] http://hartenstein.de170

TU Kaiserslautern

in Memoriam Stamatis

Vassiliadis

1951 - 2007

in Memoriam Richard Newton

1951 - 2007

in Memoriam …

Page 171: Reconfigurable Computing and the von Neumann Syndrome Reiner Hartenstein.

© 2007, [email protected] http://hartenstein.de171

TU Kaiserslautern

KressArray DPSS

ApplicationSet

DPSS

published at ASP-DAC 1995

ArchitectureEditor

MappingEditor

statist.Data

DelayEstim.

Analyzer

Architecture

Estimator

interm.form 2

expr.tree

ALE-XCompiler

PowerEstimator

PowerData

VHDLVerilog

HDLGeneratorSimulator

User

ALEXCode

Improvement Proposal Generator

Suggestion

SelectionUserInterface

interm.form 3

Mapper

DesignRules

DatapathGeneratorGenerator

KressrDPU

Layout

data stream Schedule

Scheduler

KressArrayXplorer (Platform Design Space Explorer)

Xplorer

InferenceEngine (FOX)

Sug-gest-ion

KressArrayfamily

parameters

Page 172: Reconfigurable Computing and the von Neumann Syndrome Reiner Hartenstein.

© 2007, [email protected] http://hartenstein.de172

TU Kaiserslautern

KressArray Family generic Fabrics: a few examples

Examples of 2nd Level Interconnect:layouted overrDPU cell - no separate routing areas !

+

rout-through and function

rout-throug

h only more NNports:

rich Rout Resources

Select Function

Repertory

select Nearest Neighbour (NN) Interconnect: an example

16 32 8 24

4

2 rDPU

Select mode, number, width of NNports

http://kressarray.de