Saman Amarasinghe. Lets stick with current sequential languages Parallel Programming is hard!...

20
Saman Amarasinghe
  • date post

    21-Dec-2015
  • Category

    Documents

  • view

    216
  • download

    0

Transcript of Saman Amarasinghe. Lets stick with current sequential languages Parallel Programming is hard!...

Page 1: Saman Amarasinghe. Lets stick with current sequential languages Parallel Programming is hard! Billons of LOC written in sequential languages Let the compiler.

Saman Amarasinghe

Page 2: Saman Amarasinghe. Lets stick with current sequential languages Parallel Programming is hard! Billons of LOC written in sequential languages Let the compiler.

Lets stick with current sequential languagesParallel Programming is hard!Billons of LOC written in sequential languages

Let the compiler do all the workMaintain the current strong machine abstraction

SUIF Parallelizing CompilerMonica Lam and the Stanford SUIF team 1993 – 1997

Automatically extract parallelism from sequential programs

Heroic AnalysisInterprocedural analysisArray and scalar data-flow analysisReduction and recurrence recognition C to FORTRAN

Achieved Best SPEC results of the dayVector processor Cray C90 540Uniprocessor Digital 21164508SUIF on 8 processors Digital 8400 1,016

But… Techniques were not robust for general use

spic

e 2g6

dodu

c

fppp

p

ora

md l

jdp2

wa v

e 5

md l

jsp2

a lvi

nn

n as a

7

ear

h ydr

o2d

su2 c

or

tom

catv

swm

2 56

Nu

mb

er o

f P

r oce

sso

rs

1

2

3

4

5

678

0

200

400

600

800

1000

1200

MF

LO

PS

Page 3: Saman Amarasinghe. Lets stick with current sequential languages Parallel Programming is hard! Billons of LOC written in sequential languages Let the compiler.

Composition is key to building large systems

Implemented naturally via time-multiplexing

The framework for parallelizing sequential programs

Sequential parts at outermostGlobal barriers

Page 4: Saman Amarasinghe. Lets stick with current sequential languages Parallel Programming is hard! Billons of LOC written in sequential languages Let the compiler.

Speedup = 1/(1– p + p/N)Utilization = 1/(p + N*(1 – p))

Util

izat

ion

Number of cores

Expected Year

Page 5: Saman Amarasinghe. Lets stick with current sequential languages Parallel Programming is hard! Billons of LOC written in sequential languages Let the compiler.

Speedup = 1/(1– p + p/N)Utilization = 1/(p + N*(1 – p))

Util

izat

ion

Number of cores

Expected Year

% parallel

Page 6: Saman Amarasinghe. Lets stick with current sequential languages Parallel Programming is hard! Billons of LOC written in sequential languages Let the compiler.

Speedup = 1/(1– p + p/N)Utilization = 1/(p + N*(1 – p))

Util

izat

ion

Number of cores

Expected Year

% parallel

Page 7: Saman Amarasinghe. Lets stick with current sequential languages Parallel Programming is hard! Billons of LOC written in sequential languages Let the compiler.

Speedup = 1/(1– p + p/N)Utilization = 1/(p + N*(1 – p))

Util

izat

ion

Number of cores

Expected Year

% parallel

Page 8: Saman Amarasinghe. Lets stick with current sequential languages Parallel Programming is hard! Billons of LOC written in sequential languages Let the compiler.

Currently… Theory, algorithms, languages, tools all centered around the sequential paradigmA well enforced machine abstraction

Move to muticore is a fundamental shift Akin to analog design to digital shift

Page 9: Saman Amarasinghe. Lets stick with current sequential languages Parallel Programming is hard! Billons of LOC written in sequential languages Let the compiler.

Need a new abstraction where parallelism is the primary form of expression

Parallelism is simpleParallelism is naturalCommunication is intuitive

Parallel composition of sequential segmentsWith possible space-multiplexed execution

Page 10: Saman Amarasinghe. Lets stick with current sequential languages Parallel Programming is hard! Billons of LOC written in sequential languages Let the compiler.

Parallel programming still in the dark agesElite community of practitioners Active open research, little stable consensusAssumption: we don’t know how to teach parallel programming!

Aim for a “Mead and Conway” type revolutionDevelop simple, cookbook approachesIf we can’t teach them, they’re too complex!Make them accessibleCarefully thought-out courseware, tools, texts, coursesFocus on the educational communityExporting, proselytizing, workshops, conferences,

journals, …

Page 11: Saman Amarasinghe. Lets stick with current sequential languages Parallel Programming is hard! Billons of LOC written in sequential languages Let the compiler.

1. Move to a truly parallel world (long term)Natural world is extremely parallel learn to emulate itCan we make sequential programs a special case of parallel programming?

2. Rejoice when parallelism is natural (medium term)Switch to parallel languages if using them is easier than sequential languages

3. Help migrate legacy application (short term)Existing large body of code – cannot ignore!Written in sequential languages – need to work with them

Page 12: Saman Amarasinghe. Lets stick with current sequential languages Parallel Programming is hard! Billons of LOC written in sequential languages Let the compiler.

Some domains are inherently parallelCoding them using a sequential language is…

Harder than using the right parallel abstraction All information on inherent parallelism is lost

There are win-win situations Increasing the programmer productivity while extracting parallel performance

Streaming domain and the StreamIt experience

Page 13: Saman Amarasinghe. Lets stick with current sequential languages Parallel Programming is hard! Billons of LOC written in sequential languages Let the compiler.

Picture Reorder

joiner

joiner

IDCT

IQuantization

splitter

splitter

VLD

macroblocks, motion vectors

frequency encodedmacroblocks

differentially coded motion vectors

motion vectorsspatially encoded macroblocks

recovered picture

ZigZag

Saturation

Channel Upsample Channel Upsample

Motion Vector Decode

Y Cb Cr

quantization coefficients

picture type

<QC>

<QC>

reference picture

Motion Compensation

<PT1> referencepicture

Motion Compensation

<PT1>reference picture

Motion Compensation

<PT1>

<PT2>

Repeat

Color Space Conversion

<PT1, PT2>

MPEG bit stream

Structured block level diagram describes computation and flow of data

Conceptually easy to understand

Clean abstraction of functionality

Mapping to C (sequentialization) destroys this simple view

MPEG-2 Decoder

Page 14: Saman Amarasinghe. Lets stick with current sequential languages Parallel Programming is hard! Billons of LOC written in sequential languages Let the compiler.

add VLD(QC, PT1, PT2);

add splitjoin { split roundrobin(NB, V);

add pipeline { add ZigZag(B); add IQuantization(B) to QC; add IDCT(B); add Saturation(B); } add pipeline { add MotionVectorDecode(); add Repeat(V, N); }

join roundrobin(B, V);}

add splitjoin { split roundrobin(4(B+V), B+V, B+V);

add MotionCompensation(4(B+V)) to PT1; for (int i = 0; i < 2; i++) { add pipeline { add MotionCompensation(B+V) to PT1; add ChannelUpsample(B); } }

join roundrobin(1, 1, 1);}

add PictureReorder(3WH) to PT2;

add ColorSpaceConversion(3WH);

Picture Reorder

joiner

joiner

IDCT

IQuantization

splitter

splitter

VLD

macroblocks, motion vectors

frequency encodedmacroblocks

differentially coded motion vectors

motion vectorsspatially encoded macroblocks

recovered picture

ZigZag

Saturation

Channel Upsample Channel Upsample

Motion Vector Decode

Y Cb Cr

quantization coefficients

picture type

<QC>

<QC>

reference picture

Motion Compensation

<PT1> referencepicture

Motion Compensation

<PT1>reference picture

Motion Compensation

<PT1>

<PT2>

Repeat

Color Space Conversion

<PT1, PT2>

MPEG bit stream

MPEG-2 Decoder

Page 15: Saman Amarasinghe. Lets stick with current sequential languages Parallel Programming is hard! Billons of LOC written in sequential languages Let the compiler.

Task ParallelismThread (fork/join) parallelismParallelism explicit in algorithmBetween filters without producer/consumer relationship

Data ParallelismData parallel loop (forall)Between iterations of a stateless filter Can’t parallelize filters with state

Pipeline ParallelismUsually exploited in hardwareBetween producers and consumersStateful filters can be parallelized

MPEG-2 Decoder

Picture Reorder

joiner

joiner

IDCT

IQuantization

splitter

splitter

VLD

macroblocks, motion vectors

frequency encodedmacroblocks

differentially coded motion vectors

motion vectorsspatially encoded macroblocks

recovered picture

ZigZag

Saturation

Channel Upsample Channel Upsample

Motion Vector Decode

Y Cb Cr

quantization coefficients

picture type

<QC>

<QC>

reference picture

Motion Compensation

<PT1> referencepicture

Motion Compensation

<PT1>reference picture

Motion Compensation

<PT1>

<PT2>

Repeat

Color Space Conversion

<PT1, PT2>

MPEG bit stream

Page 16: Saman Amarasinghe. Lets stick with current sequential languages Parallel Programming is hard! Billons of LOC written in sequential languages Let the compiler.

0

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

Vocod

erFFT

FMRad

ioTDE

Bitonic

Sort

MPEG2D

ecod

er

Chann

elVoc

oder

DESDCT

Filterb

ank

Serpe

nt

Radar

Geom

etric

Mea

n

Benchmarks

Th

rou

gh

pu

t N

orm

aliz

ed t

o S

ing

le C

ore

Str

eam

It

.

On a 16 core MIT Raw Processor (http://cag.csail.mit.edu/raw)

Page 17: Saman Amarasinghe. Lets stick with current sequential languages Parallel Programming is hard! Billons of LOC written in sequential languages Let the compiler.

Don’t modify a code segment if…The performance impact is insignificant and is isolated from the rest Automatic parallelizer works perfectly

Modify and annotate a segment if…Automatic parallelizer needs a little help

Otherwise rewrite the segment

Program ReincarnationA new body with the same old soul

Still inExistingSequentialLanguages

Use a ParallelLanguage

Page 18: Saman Amarasinghe. Lets stick with current sequential languages Parallel Programming is hard! Billons of LOC written in sequential languages Let the compiler.

.exe.exeOriginal

Compiler

OriginalCompiler

OriginalBinary

AutomaticParallelization

AutomaticParallelization

StaticAnalysis

StaticAnalysis

Dynamic analysis Managed program executionProgram invariant inference Application knowledge database

Assisted parallelizationGUI tool

Correctness in reincarnated Test GenerationDivergence Analysis

Static analysisAutomatic parallelization info for program understanding

Learn about the domainFlag domain specific issuesGenerate domain-specific hints

Bring programs to modern ageBlock diagramRefactoring identification

Instrumenter andBinary interpreter

Instrumenter andBinary interpreter

ManagedProgram

Execution

ManagedProgram

Execution

Program InvariantInference Engine

Program InvariantInference Engine

.log.log Application

Knowledge(program

representation & invariants)

ApplicationKnowledge

(program representation &

invariants)Known IdiomIdentification

&Domain

Hint Generation

Known IdiomIdentification

&Domain

Hint Generation

DomainKnowledgeDatabase

DomainKnowledgeDatabase

DomainKnowledgeExtraction

DomainKnowledgeExtraction

Compiler &Instrumenter

Compiler &Instrumenter

Reincarna

ted .c

Reincarna

ted .c.exe.exe

Assisted Application

Reincarnation Tool

ManagedProgram

Execution

ManagedProgram

Execution

.log.log Test GenerationTest GenerationDivergence

Analysis

DivergenceAnalysis

RefactoringIdentification

RefactoringIdentification

Block DiagramRepresentation

Block DiagramRepresentation

Legacy Program Source File

.c.c

Page 19: Saman Amarasinghe. Lets stick with current sequential languages Parallel Programming is hard! Billons of LOC written in sequential languages Let the compiler.

Multicore menace will impact all of us in a big way

Parallelism need to keep up with Moore’s curve

Will definitely need new parallel languages where parallelism is the primary form of composition

Low hanging fruit when parallelism is the natural form of expression

However, cannot ignore the past investments

Page 20: Saman Amarasinghe. Lets stick with current sequential languages Parallel Programming is hard! Billons of LOC written in sequential languages Let the compiler.

© 2007 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.

The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after

the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.