A Parameterized Dataflow Language Extension for Embedded Streaming Systems

30
A Parameterized Dataflow Language Extension for Embedded Streaming Systems Yuan Lin 1 , Yoonseo Choi 1 , Scott Mahlke 1 , Trevor Mudge 1 , Chaitali Chakrabarti 2 1 Advanced Computer Architecture Lab, University of Michigan at Ann Arbor 2 Department of Electrical Engineering, Arizona State University

description

A Parameterized Dataflow Language Extension for Embedded Streaming Systems. Yuan Lin 1 , Yoonseo Choi 1 , Scott Mahlke 1 , Trevor Mudge 1 , Chaitali Chakrabarti 2 1 Advanced Computer Architecture Lab, University of Michigan at Ann Arbor - PowerPoint PPT Presentation

Transcript of A Parameterized Dataflow Language Extension for Embedded Streaming Systems

Page 1: A Parameterized Dataflow Language Extension for Embedded Streaming Systems

A Parameterized Dataflow Language Extension for Embedded Streaming Systems

Yuan Lin1, Yoonseo Choi1, Scott Mahlke1, Trevor Mudge1, Chaitali Chakrabarti2

1Advanced Computer Architecture Lab, University of Michigan at Ann Arbor2Department of Electrical Engineering, Arizona State University

Page 2: A Parameterized Dataflow Language Extension for Embedded Streaming Systems

Embedded Streaming Systems Mobile computing: multimedia anywhere at

anytime

Many of its key workloads are embedded streaming systems Video/audio coding (i.e. H.264) Wireless communications (i.e. W-CDMA) 3D graphics and others…

Cell phones are getting more

complexPCs are getting

more mobile

Page 3: A Parameterized Dataflow Language Extension for Embedded Streaming Systems

Characteristics of Streaming Systems

LPF-Tx Scrambler

Spreader

Interleaver

Channelencoder

LPF-Rx

Searcher

Descrambler

Despreader

Combin

erDescrambl

erDespread

er

Interleaver

Channeldecoder

(Viterbi/Turbo)

Transmitter

Receiver

Analog

Upper layer

W-CDMA Physical Layer Processing

LPF-Tx

LPF-Rx

Scrambler

Spreader

Descrambler

Despreader

Combin

erDescrambl

erDespread

er

Searcher

Interleaver

Channelencoder

Interleaver

Channeldecoder

(Viterbi/Turbo)

Data are processed in a pipeline of DSP algorithm kernels Mostly vector/matrix-based data computation Periodic system reconfigurations

i.e. changing from voice communication to data communication

Page 4: A Parameterized Dataflow Language Extension for Embedded Streaming Systems

Embedded DSP Processors

ARM

SIMDUnit

LocalMem

DESIMDUnit

LocalMem

DESIMDUnit

LocalMem

DESIMDUnit

LocalMem

DE

GlobalMem

Current trend: multi-core DSPs for streaming applications IBM Cell processor TI OMAP Many other SoCs

Common hardware characteristics Multiple (potentially heterogeneous) data engines (DEs) Software-managed scratchpad memories Explicit DMA transfer operations

Our DSP case study:

SODA, a multi-core DSP processor

Page 5: A Parameterized Dataflow Language Extension for Embedded Streaming Systems

Programming Challenge How to automatically compile streaming

systems onto multi-core DSP hardware?ARM

SIMDUnit

LocalMem

DESIMDUnit

LocalMem

DESIMDUnit

LocalMem

DESIMDUnit

LocalMem

DE

GlobalMem

---- ---- ---- ---- ----

----

-------- ---- -----

---- ----

---- ----?

How to divide the system into multiple threads?

How to SIMDize DSP kernels?

When and where to issue DMA transfers?

VLIW execution scheduling?

How to manage the local and global memory?Who does the execution

scheduling?and many other problems….

Page 6: A Parameterized Dataflow Language Extension for Embedded Streaming Systems

Compile for Multi-core DSPs

Two-tier compilation approachLPF-Tx Scramble

r Spreader Interleaver

Channelencoder

LPF-Rx

SearcherDescrambl

erDespread

er

Combiner

Descrambler

Despreader

Interleaver

Channeldecoder(Viterbi/Turbo)

Transmitter

Receiver

Frontend

Upper layer

ARM

ExeUnit

LocalMem

PEExeUnit

LocalMem

PEExeUnit

LocalMem

PEExeUnit

LocalMem

PE

GlobalMemSODA

System Architecture

void Turbo(){ ...}

void Turbo(){ ...}

32-laneSIMDALU

SIMDRF

32-laneSSN

SIMDto

scalar

EX

WB

STV

VTS

scalarRF

16-bitALU

EX

WB

SIMDDataMEM

ScalarDataMEM

SIMD

Scalar

This presentation is focused on system-level language & compilation

Compiling functions, not instructions

Page 7: A Parameterized Dataflow Language Extension for Embedded Streaming Systems

System Compilation Overview

---- ---- ---- ---- ----

----

-------- ---- -----

---- ----

---- ----

SPIR

Frontend

Backend

DE0 ARM

Coarse-grained compilation Function-level, not instruction-level C/C++-to-C compiler

SPEX: Signal Processing EXtension Our high-level language extension

Frontend compilation Translate from SPEX into SPIR

SPIR: Signal Processing IR System compiler’s IR Models function-level interactions

Backend compilation Function-level compilation Generate multi-threaded C code

SPEX

Page 8: A Parameterized Dataflow Language Extension for Embedded Streaming Systems

System Compilation Overview

---- ---- ---- ---- ----

----

-------- ---- -----

---- ----

---- ----

SPIR

Frontend

Backend

DE0 ARM

SPEX Coarse-grained compilation

Function-level, not instruction-level C/C++-to-C compiler

SPEX: Signal Processing EXtension Our high-level language extension

Frontend compilation Translate from SPEX into SPIR

SPIR: Signal Processing IR System compiler’s IR Models function-level interactions

Backend compilation Function-level compilation Generate multi-threaded C code

Page 9: A Parameterized Dataflow Language Extension for Embedded Streaming Systems

SPIR: Function-level IR

---- ---- ---- ---- ----

----

-------- ---- -----

---- ----

---- ----

Frontend

Backend

PE0 ARM

SPIR

Must captures stream applications’ system-level behaviors

Based on the dataflow computation model Good for modeling streaming

computations Easy to generate parallel code

But which dataflow model?

node

FIFO

bufferFIFO buffer

node

node

FIFO buffer

SPEX

Page 10: A Parameterized Dataflow Language Extension for Embedded Streaming Systems

Synchronous Dataflow Synchronous dataflow (SDF)

Simplest dataflow model Static dataflow No conditional dataflow allowed

Pros Efficiency: can generate execution schedule during compile-time Optimality: We know how to compile SDFs for multi-processor DSPs

Berkeley Ptolemy project, MIT StreamIt compiler

Cons Lack of flexibility: Cannot describe run-time reconfigurations in

stream computations

node

input_rate = 2 output_rate = 3

Page 11: A Parameterized Dataflow Language Extension for Embedded Streaming Systems

Parameterized dataflow (PDF) Use parameters to model run-time system reconfiguration Each parameter is a variable with a finite set of discrete values

Parameterized attributes in SPIR Dataflow rates

Parameterized Dataflow

node

input_rate = {1, 4, 8} output_rate = {2, 8}

First proposed by: B. Bhattacharya and S. S. Bbhattacharyya, “Parameterized Dataflow Modeling for DSP Systems.” IEEE Transactions on Signal Processing, Oct. 2001

Page 12: A Parameterized Dataflow Language Extension for Embedded Streaming Systems

Parameterized Dataflow Parameterized dataflow (PDF)

Use parameters to model run-time system reconfiguration Each parameter is a variable with a finite set of discrete values

Parameterized attributes in SPIR Dataflow rates Conditional dataflow

IF

if_cond = {true, false}

ifnode

elsenode

IF{1,4,

8}

{2,8}{6,8} {2,4

}

Page 13: A Parameterized Dataflow Language Extension for Embedded Streaming Systems

Parameterized Dataflow Parameterized dataflow (PDF)

Use parameters to model run-time system reconfiguration Each parameter is a variable with a finite set of discrete values

Parameterized attributes in SPIR Dataflow rates Conditional dataflow Number of dataflow actors spli

tmerg

e

A[0]

A[1]

A[n]

Number of A nodes = {1, 4, 12}

Page 14: A Parameterized Dataflow Language Extension for Embedded Streaming Systems

Parameterized Dataflow Parameterized dataflow (PDF)

Use parameters to model run-time system reconfiguration Each parameter is a variable with a finite set of discrete values

Parameterized attributes in SPIR Dataflow rates Conditional dataflow Number of dataflow actors Streaming size between reconfigurations

There are also other modifications to the dataflow model Please refer to the paper for further details

stream_size = {10k, 20k}

Page 15: A Parameterized Dataflow Language Extension for Embedded Streaming Systems

PDF Run-time Execution Model

Three stage run-time execution model

Goal: provide the efficiency of the synchronous dataflow execution on parameterized dataflow

Page 16: A Parameterized Dataflow Language Extension for Embedded Streaming Systems

PDF Run-time Execution Model

Stage 1: dataflow initialization

Convert a PDF graph into a SDF graph Setting parameter variables

to constant values

Perform other initialization computation

Page 17: A Parameterized Dataflow Language Extension for Embedded Streaming Systems

PDF Run-time Execution Model

Stage 2: dataflow computation

Dataflow computation following static SDF execution schedulesStream

inputStream output

Page 18: A Parameterized Dataflow Language Extension for Embedded Streaming Systems

PDF Run-time Execution Model

Stage 3: dataflow finalization

Update the dataflow states with calculated results

Page 19: A Parameterized Dataflow Language Extension for Embedded Streaming Systems

System Compilation Frontend

---- ---- ---- ---- ----

----

-------- ---- -----

---- ----

---- ----

SPIR

Frontend

Backend

PE0 ARM

Start from a stream system described in C or C++ with SPEX

Translate the description into dataflow representation

SPEX

Page 20: A Parameterized Dataflow Language Extension for Embedded Streaming Systems

SPEX

Q: Why can’t we compile pure C/C++?

A: Some of C/C++’s language features cannot be translated into dataflow

i.e. passing pointers as function arguments C/C++: pointer’s memory locations can

be read and written Dataflow: can have read-only and

write-only edges

---- ---- ---- ---- ----

----

-------- ---- -----

---- ----

---- ----

SPIR

Frontend

Backend

PE0 ARM

SPEX

Page 21: A Parameterized Dataflow Language Extension for Embedded Streaming Systems

SPEX

#include <spex_stream.h> SPEX definition headers

class WCDMA: spex_kernel { pdf_node(interleaver)(...) { ... } Functions for declaring dataflow nodes pdf_node(turbo_dec)(...) { ... }

pdf_graph(wcdma_rec)() Functions for declaring a dataflow graph { ... interleaver(intlv_to_turbo, intlv_in); turbo_dec(turbo_out, intlv_to_turbo); ... }};

SPEX is a set of keywords and language restrictions

A guideline for programmers to write stylized C/C++ code that can be translated into dataflow Dataflow-safe C/C++ programming

SPEX code can be compiled directly with g++

Page 22: A Parameterized Dataflow Language Extension for Embedded Streaming Systems

SPEX pdf_node Code Snippets

pdf_node(fir)(channel<int> in, channel<int> & out){  ... z[0] = in.pop();  for (i = 0; i < TAPS; i++) {    sum += z[i] * coeff[i];  } out.push(sum); ...}

Read-only input dataflow edge

Write-only output dataflow edge

FIR’s dataflow input

FIR’s dataflow output

Page 23: A Parameterized Dataflow Language Extension for Embedded Streaming Systems

SPEX Code Snippetspdf_graph(WCDMA_rec)() { FIR fir;  ... channel<int> fir_to_rake; ...  pdf {   for (i = 0; i < slot_size; i++) { fir.run(fir_to_rake, AtoD); rake.run(rake_out, fir_to_rake); if (mode == voice) viterbi.run(mac_in, rake_out); else turbo.run(mac_in, rake_out); mac(mac_in);    }  } }pdf_graph_init(WCDMA_rec)() { ... }pdf_graph_final(WCDMA_rec)() { ... }

Static PDF node and edge declarations

PDF scope: a PDF graph description.

Language restrictions within PDF scope.i.e. - Must only use for-loop constructions with constant loop-bounds- Must only include function calls to pdf_node functions.

A guideline for writing dataflow-safe C++ code

Descriptions for dataflow initialization and finalization stagesfir rake if

vit

turif mac

Page 24: A Parameterized Dataflow Language Extension for Embedded Streaming Systems

System Compilation Frontend---- ---- ---- ---- ----

----

-------- ---- -----

---- ----

---- ----

SPIR

Frontend

Backend

PE0 ARM

Translate SPEX into parameterized dataflow representation Use traditional control-flow and

dataflow analysis

Semantic error-checking to ensure dataflow-safe C/C++ code

Possible to support other high-level languages

Page 25: A Parameterized Dataflow Language Extension for Embedded Streaming Systems

System Compilation Backend---- ---- ---- ---- ----

----

-------- ---- -----

---- ----

---- ----

SPIR

Frontend

Backend

PE0 ARM

Function-level compilation Node-to-DE assignments Memory buffer allocations DMA assignments

Function-level optimizations Software pipelining

Code generation Parallel thread generation Physical buffer allocation If-conversion and predicate

propagation

Page 26: A Parameterized Dataflow Language Extension for Embedded Streaming Systems

Conclusion

System-level compilation framework

We have a working compiler for SPEX Target: SODA-like multi-core DSPs

Parameterized dataflow is used as compiler IR

SPEX is a set of language extensions for efficient translation from C/C++ into dataflow

---- ---- ---- ---- ----

----

-------- ---- -----

---- ----

---- ----

SPIR

Frontend

Backend

DE0 ARM

Page 27: A Parameterized Dataflow Language Extension for Embedded Streaming Systems

Questions www.eecs.umich.edu/~sdrg

Page 28: A Parameterized Dataflow Language Extension for Embedded Streaming Systems

Shared Variables In Dataflow Shared variables are not allowed in traditional dataflow

models

SPIR allows shared variables between dataflow nodes Multi-dimensional streaming patterns Non-sequential streaming patterns Decoupled streaming Shared memory buffers

Page 29: A Parameterized Dataflow Language Extension for Embedded Streaming Systems

Backend Compilation---- ---- ---- ---- ----

----

-------- ---- -----

---- ----

---- ----

SPIR

Frontend

Backend

PE0 ARM

FIRRak

eTurb

o

Problem with function-level compilation Requires function-level parallelism Wireless protocols do not have many

concurrent functions

FIR

Rake

Turbo

in[0..N] PE

0PE1

PE2

Page 30: A Parameterized Dataflow Language Extension for Embedded Streaming Systems

Backend Compilation---- ---- ---- ---- ----

----

-------- ---- -----

---- ----

---- ----

SPIR

Frontend

Backend

PE0 ARM

Utilize existing compiler optimization Function-level software pipelining

Processing each stream data is the same as a loop iteration

Modulo scheduling applied to function-level compilation

FIRRak

eTurb

o

in[i]

PE0

PE1

PE2

FIRRak

eTurb

o

in[i+1]

FIRRak

eTurb

o

in[i+2]

Turbo

RakeFIR