purl.tue.nlpurl.tue.nl/564398305728663.pdf · Schedule-Extended Synchronous Dataflow Graphs Morteza...

21
Schedule-Extended Synchronous Dataflow Graphs Morteza Damavandpeyma, Sander Stuijk, Twan Basten, Marc Geilen and Henk Corporaal This report is an extension of the following journal paper to be published. It adds the proofs omitted from the publication. If you want to cite this report, please refer to the paper instead. M. Damavandpeyma, S. Stuijk, T. Basten, M.C.W. Geilen and H. Corporaal, “Schedule-Extended Synchronous Dataflow Graphs”. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2013. ES Reports ISSN 1574-9517 ESR-2013-01 17 May 2013 Eindhoven University of Technology Department of Electrical Engineering Electronic Systems

Transcript of purl.tue.nlpurl.tue.nl/564398305728663.pdf · Schedule-Extended Synchronous Dataflow Graphs Morteza...

Page 1: purl.tue.nlpurl.tue.nl/564398305728663.pdf · Schedule-Extended Synchronous Dataflow Graphs Morteza Damavandpeyma, Sander Stuijk, Twan Basten, Marc Geilen and Henk Corporaal This

Schedule-Extended Synchronous Dataflow Graphs

Morteza Damavandpeyma, Sander Stuijk, Twan Basten, Marc Geilen and Henk Corporaal

This report is an extension of the following journal paper to be published. It adds the proofs omitted

from the publication. If you want to cite this report, please refer to the paper instead.

M. Damavandpeyma, S. Stuijk, T. Basten, M.C.W. Geilen and H. Corporaal, “Schedule-Extended

Synchronous Dataflow Graphs”. IEEE Transactions on Computer-Aided Design of Integrated

Circuits and Systems, 2013.

ES Reports ISSN 1574-9517

ESR-2013-01 17 May 2013

Eindhoven University of Technology Department of Electrical Engineering Electronic Systems

Page 2: purl.tue.nlpurl.tue.nl/564398305728663.pdf · Schedule-Extended Synchronous Dataflow Graphs Morteza Damavandpeyma, Sander Stuijk, Twan Basten, Marc Geilen and Henk Corporaal This

Copyright (c) 2013 IEEE. Personal use of this material is permitted. However, permission to use this

material for any other purposes must be obtained from the IEEE by sending an email to

[email protected].

http://www.es.ele.tue.nl/esreports

[email protected]

Eindhoven University of Technology

Department of Electrical Engineering

Electronic Systems

PO Box 513

NL-5600 MB Eindhoven

The Netherlands

Page 3: purl.tue.nlpurl.tue.nl/564398305728663.pdf · Schedule-Extended Synchronous Dataflow Graphs Morteza Damavandpeyma, Sander Stuijk, Twan Basten, Marc Geilen and Henk Corporaal This

TECHNICAL REPORT, ESR-2013-01, EINDHOVEN UNIVERSITY OF TECHNOLOGY, MAY 2013 1

Schedule-Extended Synchronous Dataflow GraphsMorteza Damavandpeyma, Student Member, IEEE, Sander Stuijk, Twan Basten, Senior Member, IEEE,

Marc Geilen, Member, IEEE, and Henk Corporaal, Member, IEEE

Abstract—Synchronous dataflow graphs (SDFGs) are usedextensively to model streaming applications. An SDFG can beextended with scheduling decisions, allowing SDFG analysis toobtain properties like throughput or buffer sizes for the scheduledgraphs. Analysis times depend strongly on the size of the SDFG.SDFGs can be statically scheduled using static-order schedules.The only generally applicable technique to model a static-orderschedule in an SDFG is to convert it to a homogeneous SDFG(HSDFG). This may lead to an exponential increase in the size ofthe graph and to sub-optimal analysis results (e.g., for buffer sizesin multi-processors). We present techniques to model two types ofstatic-order schedules, i.e., periodic schedules and periodic singleappearance schedules, directly in an SDFG. Experiments showthat both techniques produce more compact graphs comparedto the technique that relies on a conversion to an HSDFG. Thisresults in reduced analysis times for performance properties andtighter resource requirements.

Index Terms—Synchronous dataflow graphs, periodic sched-ules, single appearance schedules, schedule modeling.

I. INTRODUCTION

Synchronous dataflow graphs (SDFGs) are widely used tomodel digital signal processing (DSP) and multimedia appli-cations [1]–[4]. Model-based design-flows (e.g., [1], [5]–[8])model binding and scheduling decisions into the SDFG. Thisenables the analysis of performance properties (e.g, through-put [9]) or resource requirements (e.g., buffer sizes [10]) underresource constraints. Figure 1 shows an example of an SDFGwith four actors and three channels. An essential property ofSDFGs is that every time an actor fires (executes) it consumesa fixed number of tokens from its input edges and produces afixed number of tokens on its output edges. These numbers arecalled the rates (indicated next to the channel ends when therates are larger than 1). The fixed port rates make it possibleto statically schedule SDFGs.

Many SDFG analysis algorithms, e.g., throughput calcu-lation or buffer sizing, are straightforward when a singleprocessor platform is used. For instance, the buffer sizescan be determined by executing the SDFG according to agiven schedule. However, in a multi-processor environment,SDFG analysis algorithms are not trivial because of the inter-processor communication, amongst other reasons. An SDFGcan be bound to a multi-processor platform. Each processorin the platform executes a set of actors from the SDFG;the firings of actors bound to a processor are required to besequentialized. For this purpose, a finite periodic schedule can

The authors are with the Department of Electrical Engineering,Eindhoven University of Technology, Eindhoven, The Netherlands(e-mail:m.damavandpeyma, s.stuijk, a.a.basten, m.c.w.geilen,[email protected]).

T. Basten is also with TNO, Embedded Systems Innovation, Eindhoven,The Netherlands.

a0 a1c0

6a2

c1

a3c2

33

Fig. 1. An example SDFG.

be constructed. Such a schedule is called a periodic static-order schedule (PSOS). PSOSs only specify the firing orderof actors. This separates them from fully static schedules,which determine absolute start times of actors (e.g., schedulesgenerated using the technique of [11]). Traditionally, forDSP software synthesis, a sub-set of all periodic static-orderschedules is considered. This sub-set contains so-called singleappearance schedules (SAS) [1]. In a SAS, the functionalcode of the actors is included in a nested loop structure suchthat each piece of code occurs only once. This minimizes thecode size potentially at the cost of additional buffer memoryneeded to implement the channels. A model-based design-flowusually uses PSOSs (or a sub-set of PSOSs such as SASs) foran application modeled with an SDFG. In this way timing(throughput) and memory usage (buffers) can be analyzed.

There is only one technique [12] known to model PSOSsin an SDFG. This technique requires a conversion of anSDFG to a so-called homogeneous SDFG (HSDFG) in whichall rates are one [2]. Figure 2 (without the colored edges)shows the equivalent HSDFG of the SDFG in Figure 1.The technique of [12] sequentializes the actor firings byinserting a channel between each pair of consecutive actorsin a schedule. At the end of a schedule, it adds a channelwith one initial token from the last to the first actor in theschedule. This ensures an indefinite execution of the graphaccording to the schedule. To model PSOSs s0 = 〈a0(a2)2〉∗and s1 = 〈(a1)5(a3)3a1(a3)3〉∗, the technique of [12] adds intotal 15 channels to the HSDFG of the example graph (thegreen edges for s0 and the blue edges for s1 in Figure 2). Forexample, s0 indicates an indefinite sequence of one firing ofa0 followed by two firings of a2. This order is enforced in theHSDFG of Figure 2 by the green edges between the actorsa0 1, a2 1, and a2 2.

The SDFG to HSDFG conversion can lead to an exponen-tial increase in the size of the graph. For example, such aconversion for an H.263 decoder [10] (with QCIF resolution)increases the graph size from 4 actors to 1190 actors. Note thatthe number of actors in the resulting HSDFG highly dependson how the application is modeled in the original SDFG. Therun-time of SDFG analysis algorithms depends amongst otherson the size of the graph. As a result, the run-time of manySDFG analysis algorithms may increase drastically whenmodeling PSOSs in the graph using the technique from [12].For example, the buffer sizing algorithm from [10] takes less

Page 4: purl.tue.nlpurl.tue.nl/564398305728663.pdf · Schedule-Extended Synchronous Dataflow Graphs Morteza Damavandpeyma, Sander Stuijk, Twan Basten, Marc Geilen and Henk Corporaal This

TECHNICAL REPORT, ESR-2013-01, EINDHOVEN UNIVERSITY OF TECHNOLOGY, MAY 2013 2

a0_1

a1_1

a1_2

a1_3

a1_4

a1_5

a1_6

a2_1

a3_1

a3_2

a3_3

a3_4

a3_5

a3_6

a2_2

Fig. 2. PSOSs s0 = 〈a0(a2)2〉∗ and s1 = 〈(a1)5(a3)3a1(a3)3〉∗ modeledin the SDFG of Figure 1 using the technique from [12]; each ai j actor inthe HSDFG is an instance of SDFG actor ai.

than 1 ms on the SDFG of an H.263 decoder. Modelinga schedule into this SDFG using the technique from [12],the same analysis lasts 1330 ms. SDFG analysis algorithmsare usually repeated more than once in an iterative design-flow. As an example, for the SDFG of an H.263 decoder, thedesign-flow from [6] performs 8 throughput calculations onthe SDFG to obtain the desired binding. Hence, it is vitalto maintain a compact schedule-extended graph, i.e., a graphin which schedules are modeled explicitly, to provide a fastand practical design flow. There is a second drawback to thetechnique from [12]. The original graph structure is lost dueto the conversion to an HSDFG. A single channel in an SDFGcorresponds to a set of channels in the HSDFG. In Figure 2, forexample, the six edges between actor a0 1 and the a1 j actorscorrespond to the single edge between a0 and a1 in the SDFGof Figure 1. As a result, common buffer sizing techniquescannot find the minimal buffer size for the original SDFG.The H.263 decoder buffer sizes are for example overestimatedby 49% when applying the technique of [10] to the HSDFG.

A novel technique is needed to model PSOSs in an SDFG.This technique should limit the increase in the number ofactors such that analysis times do not increase too muchwhen analyzing the SDFG with its schedules. The techniqueshould also preserve the original graph structure as this enablesaccurate analysis of graph properties such as buffer sizes.In [13], we presented a schedule modeling technique, calledDSM, to model any PSOS directly in an SDFG. In thispaper, DSM is discussed in more detail. In addition, a secondschedule modeling technique, called SASM, is introducedthat is limited to SASs, but that results in more compactmodels compared to the first technique when modeling SASs.Correctness of the two approaches is formalized. Extensiveexperiments are carried out for evaluation purposes.

DSM and SASM can be used in any model-based design-flow that models PSOSs into the SDFG (e.g., [1], [5]–[8]).Conversion to an HSDFG may be inevitable at some steps ofa design trajectory. For example, multi-processor schedulingmay require such conversions, although some techniques exist

that can find schedules for SDFGs without any conversion toHSDFGs. For example, the technique presented in [14] solvesthe buffer sizing and scheduling problems simultaneously atthe SDFG level. It is not the conversion from SDFG toHSDFG itself that is problematic though. The problem isthat analyses or optimizations on large HSDFGs may betime consuming (e.g. throughput analysis) or inaccurate (e.g.buffer sizing). With our techniques, obtained schedules canbe annotated back to the original SDFG; hence, the lateranalysis and optimization can be performed on the schedule-extended SDFG. Besides the already mentioned analyses, alsofor example dynamic voltage scaling can be directly applied toa schedule-extended SDFG model of an application mapped toa multi-processor platform [15]. Note that code generation isanother step which requires an SDFG to HSDFG conversion;this conversion can be delayed until all (or most of the) prioranalyses are carried out on the SDFG. Ultimately, the proposedtechniques may save significant amounts of analysis time ina multi-processor design flow and they may lead to moreaccurate results.

The remainder of the paper is structured as follows. The nextsection discusses related work. Section III introduces SDFGs.Section IV formalizes SDFG schedules. Sections V and VIpresent our techniques to model PSOSs and SASs in an SDFG.Section VII contains the theorems related to the correctnessof the presented techniques. We evaluate our technique byapplying it to several realistic applications in Section VIII.Section IX concludes.

II. RELATED WORK

The technique from [12] is the only available techniqueto model PSOSs in an SDFG, through a conversion to anHSDFG. As already explained, this technique may result ina long run-time for analysis algorithms and/or inaccurateresults from these algorithms. Our techniques alleviate bothshortcomings of the technique from [12].

The work in [16] models the effect of a budget scheduleror preemptive TDMA scheduler on the temporal behaviorof the SDFG, either by computing an accurate worst-caseresponse time or, more precisely, by introducing additionalactors to model the timing impact as a latency-rate model.In contrast, for non-preemptive schedules, such as PSOSs,we focus on the ordering of actor firings; their executiontime remains the same. We force an SDFG to follow thePSOSs selected for each processor. This allows SDFG analysisto obtain properties like throughput or buffer sizes for thescheduled SDFG. This is also true for the models of [16].However, for non-preemptive schedules, our results are tighterand our techniques require less analysis time. The authors of[17] have shown that an SDFG can be used to consider anapplication with resource sharing possibilities; they performbuffer sizing under a throughput constraint considering a givenschedule for the actors using a shared resource. For sharedresource analysis, they use event-models [18] which is basedon realtime-calculus [19]. Our approach differs from [17],since modeling schedules directly into SDFGs enables us touse dedicated analysis for dataflow graphs. Moreover, the

Page 5: purl.tue.nlpurl.tue.nl/564398305728663.pdf · Schedule-Extended Synchronous Dataflow Graphs Morteza Damavandpeyma, Sander Stuijk, Twan Basten, Marc Geilen and Henk Corporaal This

TECHNICAL REPORT, ESR-2013-01, EINDHOVEN UNIVERSITY OF TECHNOLOGY, MAY 2013 3

technique of [17] can only handle an SDFG with limitedtypes of cycles, such as cycles formed by self-edges or theback edges modeling the buffer capacity between two actors.However, staying in dataflow domain, as is done in ourtechnique, does not impose such a limitation on the graphstructure.

Ref [20] uses some new (custom) components, e.g., if −then− else, to model schedules in an SDFG. These compo-nents are not supported by the common model-based design-flows using SDFGs (e.g., [1], [5]–[8]) and cannot be modeledin an SDFG using the basic elements of an SDFG (i.e., actorsand channels). Our techniques eliminate the need for any new(custom) component. As a result, any analysis technique forSDFGs is directly applicable to the schedule-extended SDFG.

III. SYNCHRONOUS DATAFLOW GRAPHS

Let N denote the positive natural numbers and N0 = N ∪0. Consider Ports as a set that contains all ports; each portp ∈ Ports has a finite rate Rate(p) ∈ N. An actor ai is atuple (In,Out) consisting of a set In ⊆ Ports of input portsand a set Out ⊆ Ports of output ports with In

⋂Out = ∅.

Definition 1. (SDFG) An SDFG is a tuple (A,C) consistingof a finite set A of actors and a finite set C ⊆ Ports2 ofchannels. The channel source is an output port of an actorand the channel destination is an input port of an actor.Each port of an actor is connected to only one channel andeach channel end is connected to a single port. For everyactor ai = (In,Out) ∈ A, InC(ai) represents all channelsconnected to the ports in In and OutC(ai) represents allchannels connected to the ports in Out.

The SDFG of Figure 1 has four actors (A =a0, a1, a2, a3) and three channels (C = c0, c1, c2). Actorscommunicate with tokens sent from one actor to another overthe channels. Tokens are depicted with a solid dot (and anattached number in case of multiple tokens). An essentialproperty of SDFGs is that every time an actor fires (executes)it consumes a fixed number of tokens from its input edges andproduces a fixed number of tokens on its output edges. Thesenumbers are called the rates (indicated next to the channelends when the rates are larger than 1). The rates determinehow often actors have to fire with respect to each other suchthat the distribution of tokens over all channels is not changed.This property is captured in the repetition vector [1] of anSDFG. The repetition vector determines the number of timeseach actor should be fired in order to bring the SDFG backto its initial token distribution. Notation γ(a) refers to therepetition vector value of actor a. The repetition vector of theSDFG shown in Figure 1 is γ = [1 6 2 6]T . It corresponds to1 firing of a0, 6 firings of a1, 2 firings of a2 and 6 firings ofa3. Channels can contain different numbers of tokens. A stateof an SDFG (represented by ω) is described by the numberof tokens in all channels of the SDFG. We assume that theinitial state of an SDFG is given by an initial token distributionω0. An actor ai ∈ A is enabled in an SDFG state ωj iffωj(c) ≥ Rate(q) for each channel c = (p, q) ∈ InC(ai).When an actor ai starts its firing, it removes Rate(q) tokens

from all (p, q) ∈ InC(ai) and when it ends, it producesRate(p) tokens on every (p, q) ∈ OutC(ai). Consistency (i.e.,the existence of a repetition vector) and absence of deadlockare necessary in practice for SDFGs and can be verified effi-ciently [21], [22]. Any SDFG which is not consistent requiresunbounded memory to execute or deadlocks. Therefore, welimit ourselves to consistent and deadlock-free SDFGs.

IV. SDFG STATIC-ORDER SCHEDULING

Assume that the initial state ω0 for the SDFG of Figure 1is equal to (c0, c1, c2)→ (0, 0, 0). Actor a0 is enabled in ω0.Firing a0 results in a transition from state ω0 to a state (6, 0, 0).We use this concept of states and transitions to formalize theexecution of an SDFG.

Definition 2. (EXECUTION) An execution σ of an SDFG is aninfinite alternating sequence of states and transitions ω0

ai−→ω1

aj−→ · · · starting from a designated initial state ω0.

In a multi-processor system, multiple actors may be boundto the same processor. These actors may be enabled at thesame time. In such a situation, a schedule is needed to orderthe firings of the enabled actors on the processor. The fixedport rates make it possible to statically schedule SDFGs with afinite schedule per processor. Such a schedule orders the actorfirings on the underlying processor. This type of schedules,which are called periodic static-order schedules (PSOSs), canbe repeated indefinitely. A separate PSOS should be con-structed for each processor. Each PSOS only includes actorsbound to this specific processor. The following definition isused to formally specify a PSOS.

Definition 3. (PERIODIC STATIC-ORDER SCHEDULE(PSOS)) A PSOS is a finite ordered list of (a sub-setof) actors in an SDFG (A,C). A PSOS is denoted bysi = 〈α1α2 . . . αn〉∗ where each αj |1≤j≤n is a sub-schedulethat represents an actor from A and n ∈ N is the length ofthe schedule si, represented by n = |si|. The set Ai containsall actors that appear at least once in si (Ai ⊆ A).

A PSOS can be represented in a compact format, called alooped schedule (LS).

Definition 4. (LOOPED SCHEDULE (LS)) A looped schedule,si = 〈(α1)β1(α2)β2 · · · (αm)βm〉∗, is defined as a successiveexecution of α1 repeated β1 times followed by α2 repeated β2times and so on, where each αj is either an actor firing or a(nested) looped schedule and βj ∈ N.

Definition 5. (SINGLE APPEARANCE SCHEDULE (SAS)) ALS in which each actor appears only once is called a singleappearance schedule (SAS).

Assume that the SDFG of Figure 1 is mapped to a platformwith two processors (P0 and P1). Actors a0 and a2 are mappedto P0 with the PSOS s0 = 〈a0(a2)2〉∗ and actors a1 and a3are mapped to P1 with the PSOS s1 = 〈(a1)5(a3)3a1(a3)3〉∗.PSOS s0 is a SAS while PSOS s1 is not. PSOS s′1 =〈(a1)3(a3)3〉∗ can be used as a SAS for actors a1 and a3.

Definition 6. (SDFG ITERATION) Assume SDFG (A,C) has

Page 6: purl.tue.nlpurl.tue.nl/564398305728663.pdf · Schedule-Extended Synchronous Dataflow Graphs Morteza Damavandpeyma, Sander Stuijk, Twan Basten, Marc Geilen and Henk Corporaal This

TECHNICAL REPORT, ESR-2013-01, EINDHOVEN UNIVERSITY OF TECHNOLOGY, MAY 2013 4

repetition vector γ. An SDFG iteration is a set of actor firingssuch that for each a ∈ A, the set contains γ(a) firings of a.

Definition 7. (PSOS ITERATION) Let si = 〈α1α2 . . . αn〉∗ bea PSOS that schedules actors in Ai ⊆ A. A PSOS iteration isa sequence of actor firings following the actor order specifiedin si starting from actor α1 and ending with actor αn with alength equal to |si| and including only actors from Ai.

The actor firing order in an execution σ = ω0ax−→

ω1ay−→ · · · can be captured using a list 〈ax, ay, · · · 〉

where the jth element in this list is the actor which isfired in the transition from ωj−1 to ωj . The notationorderList(σ,Ai) represents the mentioned list where actorswhich do not belong to Ai are omitted. For example, in theSDFG of Figure 1, consider an execution σ that results inlist 〈a0, a1, a1, a1, a2, a1, a1, a3, a3, a3, a1, a2, a3, a3, a3〉with A1 = a1, a3; then orderList(σ,A1) =〈a1, a1, a1, a1, a1, a3, a3, a3, a1, a3, a3, a3〉. We say thatthe corresponding execution of an SDFG satisfies a PSOSwhen the SDFG is executed according to the PSOS. We usethe following to formalize this term.

Definition 8. (SATISFACTION) Let σ be an execution of anSDFG (A,C) and si a PSOS for actors Ai ⊆ A. If it exists,let σ′ be the prefix of σ such that it contains exactly γ(ai)occurrences of actor ai ∈ Ai; σ′ covers σ precisely up to thepoint that one PSOS iteration is executed. Execution σ satisfiesPSOS si iff σ′ exists and the ordered list orderList(σ′, Ai)corresponds to the order specified in si.

When an execution of a consistent and deadlock-free SDFGsatisfies the specified PSOSs, the channels of the SDFG needbounded memories (according to Theorem 1 from [23]). Thenumber of actor appearances in the PSOS is a fraction ormultiple of its repetition vector entry. Formally, each actor aiin the PSOS should appear r · γ(ai) times in the PSOS (withr = u

v where u, v ∈ N) and the value r is identical for allactors in the PSOS [9]. This follows from the SDFG propertythat firing each actor as often as indicated in the repetitionvector results in a token distribution that is equal to the initialtoken distribution. In the paper, the term normalized PSOS isused to refer to a PSOS with r equal to 1.

Definition 9. (NORMALIZED PSOS) A PSOS si is callednormalized iff each actor aj ∈ Ai appears γ(aj) times inone iteration of the PSOS si.

We limit ourselves in the remainder to PSOSs in whichr is a unit fraction (i.e., r = u

v with u = 1 and v ∈ N),although our technique can also be directly applied to modelother PSOSs (i.e., in which u ∈ N).

V. MODELING PERIODIC STATIC-ORDER SCHEDULES

In this section, we introduce a technique to model PSOSsin an SDFG. We first illustrate all ingredients of our approachthrough a running example, and then discuss the algorithmand the main steps in the algorithm in more detail. Note thata schedule is correctly modeled if and only if any executionof the schedule-extended graph satisfies the schedule and if

any execution of the original graph that satisfies the scheduleis still feasible in the schedule-extended graph.

A. Running example

Here we briefly introduce our approach in modeling aPSOS in an SDFG using a running example. For this purposeconsider the example SDFG shown in Figure 1 and the PSOSs1 = 〈(a1)5(a3)3a1(a3)3〉∗ which is a schedule for a1 anda3. Our approach captures this schedule in the SDFG inthree steps, that (i) remove auto-concurrency, (ii) avoid inter-iteration execution, and (iii) enforce the correct schedulingdecisions.

In the example SDFG, a single firing of a0 produces 6tokens in channel c0; then 6 firings of a1 can be performedsimultaneously. This simultaneous firing of an actor is calledauto-concurrency; in practice, this corresponds to executingmultiple instances of a function (task) in parallel. Auto-concurrency for an actor cannot be handled in a real hardwareplatform, unless more than one processor is allocated forthat actor. In this work, we focus on the case that an actoris mapped to one processor. Hence, auto-concurrency mustbe removed from the model. To sequentialize firings of anactor, a self-edge with one initial token must be added to thatactor; this way auto-concurrency can be removed for that actor.Figure 3(a) shows the example SDFG of Figure 1 in which anyauto-concurrency related to a1 and a3 is removed by addingtwo self-edges (shown in red).

In one PSOS iteration, each actor must fire a specificnumber of times. Actors must not be able to get fired morethan the number of times indicated by the PSOS. Consider thefollowing situation in the example SDFG of Figure 1. Twofirings of a0 provide 12 tokens in channel c0; this numberof tokens is enough for 12 firings of the actor a1. The first6 firings of a1 belong to the first iteration and the second 6firings belong to the second iteration. The second 6 firings ofa1 can occur before the completion of the first PSOS iterationof s1; in this case, the resulting execution does not satisfythe given PSOS s1. This situation is called inter-iterationexecution. To prevent inter-iteration execution related to s1,we create a dependency from the last actor appearing in s1to the first actor appearing in s1; see the blue elements inFigure 3(b). This dependency limits the firing of the first actorto a number of firings required in one PSOS iteration.

In the SDFG of Figure 3(b), consider the case that actorsa0, a1 and a2 have fired 1, 3 and 1 times, respectively. Theinitial token distribution is changed to the distribution shownin Figure 3(c). In this graph, both a1 and a3 from PSOS s1 areenabled; but, only the firing of a1 must be granted at this pointto form an execution that satisfies the given PSOS s1. We callsuch a state in which several actors are enabled a decisionstate. Later on, a precise definition is given for this concept.According to schedule s1, actor a3 must get enabled after 5firings of a1. For this purpose, a dependency is created froma1 to a3 (shown with green actor and channels in Figure 3(d));this new dependency prevents a3 from getting enabled unlessa1 has completed 5 firings. Another decision state can be foundafter a1 has completed 5 firings; once again, both a1 and a3

Page 7: purl.tue.nlpurl.tue.nl/564398305728663.pdf · Schedule-Extended Synchronous Dataflow Graphs Morteza Damavandpeyma, Sander Stuijk, Twan Basten, Marc Geilen and Henk Corporaal This

TECHNICAL REPORT, ESR-2013-01, EINDHOVEN UNIVERSITY OF TECHNOLOGY, MAY 2013 5

c1

6

c0 c2

3a1a0 a2 a3

cse.1

cse.3

Auto-concurrency

(a) Auto-concurrency has been removed.

c1

6

c0 c2

3 3

66

c1.pro c1.pre

6

a1a0 a2 a3

a1.end

cse.1

cse.3

Auto-concurrency Inter-iteration execution

(b) Auto-concurrency and inter-iteration execution have been removed.

c1

6

c0 c2

3 3

63

c1.pro c1.pre

6

a1a0 a2 a3

a1.end

cse.1

cse.3

Auto-concurrency Inter-iteration execution

3 3

(c) SDFG of Figure 3(b) after 1, 3 and 1 firings of a0, a1 and a2 resp.,leading to a decision state in which both actors a1 and a3 are enabled.

c1

6

c0 c2

3 3

1.ω66 6

66

a

c1.pro c1.pre

c1.a ω3 61 6

c1.a ω

6

a1a0 a2 a3

a1.end

cse.1

cse.3

Auto-concurrency

Decision states

Inter-iteration execution

(d) Creating a dependency for the first decision state.

c1

6

c0 c2

3 3

1.ω6

6

6 6

6

66

3

5a

c1.pro c1.pre

c1.a ω3 9

c1.a ω3 61 6

c1.a ω1 9

c1.a ω

6

a1a0 a2 a3

a1.end

1.ω9a

cse.1

cse.3

Auto-concurrency

Decision states

Inter-iteration execution

(e) Creating a dependency for the second decision state.

Fig. 3. Step-by-step modeling of the PSOS s1 in the SDFG of Figure 1.

from PSOS s1 are enabled at this point; but, the firing ofa3 must take place. For this purpose, a dependency is createdfrom a3 to a1 (see Figure 3(e)). This new dependency preventsa1 from getting enabled from the current state onwards unlessa3 has completed 3 firings. The 5 initial tokens on the addedinput channel to a1 ensure that the first 5 firings of a1 can takeplace as planned. The SDFG of Figure 3(e) shows the finalsolution that models the PSOS s1 in the SDFG of Figure 1.

B. The algorithm

Algorithm 1 captures our technique, called decision statemodeling (DSM). DSM accepts an SDFG and one or several

PSOSs as its input. In the algorithm n + 1 (n ∈ N0) is thenumber of processors (or input PSOSs). DSM ensures thatactor firings of each PSOS follow the specified order in thatPSOS; the output of DSM is a new SDFG that models theprovided PSOSs in the input SDFG. Figure 4 depicts thecorresponding SDFG of Figure 1 which models the PSOSs s0and s1 using DSM. The remainder of this section discusses thethree main steps of the algorithm - removing auto-concurrency,avoiding inter-iteration execution, enforcing correct decisionsin decision states - in detail.

The description of some basic functions used in Algorithm1 is as follows. The function AA(G, anew) is responsibleto include the actor anew in the SDFG G. The functionAC(G, cnew, asrc, adst, srcRate, dstRate, initTok) adds thechannel cnew from source actor asrc to destination actor adst;the production rate of asrc on this channel is equal to srcRateand the consumption rate of adst on this channel is equal todstRate; this channel is initialized with initTok initial tokens.The function CNT (aj , si) returns the number of times thatthe actor aj is fired in one iteration of PSOS si. The functionBEF(ak,j,si) returns the number of times that ak appears fromthe first position in the PSOS si to and including the jth

position in the PSOS si; the function AFT(ak,j,si) returns thenumber of times that ak appears from the (j + 1)

th positionin the PSOS si to the last position in the PSOS si.

C. Auto-concurrency

An actor ai ∈ A can be enabled multiple times simultane-ously in an SDFG state ωj if ωj(c) ≥ k · Rate(q) for eachchannel c = (p, q) ∈ InC(ai) where k ∈ N, k ≥ 2. This iscalled auto-concurrency. The firings of actor ai should occursequentially according to the PSOS to which ai belongs. Thissequential execution can be enforced by adding a self-edgewith one initial token to actor ai (Line 1 in Algorithm 1). InFigure 4, channels cse.0 − cse.3 (shown in red) are used toprevent any auto-concurrency in the SDFG of Figure 1.

D. Inter-iteration execution

Consider actor a0 in the SDFG of Figure 1; the 1st firing ofa0 belongs to the 1st PSOS iteration of s0 and the 2nd firing ofa0 belongs to the 2nd PSOS iteration of s0 and so on. Sincea0 has no input channel, it can always be fired regardlessof other actors in the graph. This behavior, which is calledinter-iteration execution, can prevent an SDFG execution fromsatisfying the given PSOSs. Inter-iteration execution happenswhen one PSOS iteration has not been completed and an actor

c1

6

c0 c2

3 3

1.ω6

6

6 6

6

626

3

5a

c0.prec0.pro c1.pro c

1.pre

c1.a ω3 9

c1.a ω3 61 6

c1.a ω1 9

c1.a ω

6

a1a0 a2 a3

a1.enda0.end

1.ω9a

cse.0

cse.1 cse.2

cse.3

Auto-concurrency

Decision states

Inter-iteration execution

Fig. 4. PSOSs s0 and s1 modeled in the SDFG of Figure 1 using DSM.

Page 8: purl.tue.nlpurl.tue.nl/564398305728663.pdf · Schedule-Extended Synchronous Dataflow Graphs Morteza Damavandpeyma, Sander Stuijk, Twan Basten, Marc Geilen and Henk Corporaal This

TECHNICAL REPORT, ESR-2013-01, EINDHOVEN UNIVERSITY OF TECHNOLOGY, MAY 2013 6

Algorithm 1: Decision State Modeling (DSM)input : SDFG G(A,C), PSOSs s0, · · · , snoutput: G extended with schedules s0, · · · , snadd a self edge with 1 initial token for each a ∈ A1s′0, µ0, · · · s′n, µn ← normalize(G, s0, · · · , sn)2for i← 0 to n do3/* To control inter-iteration execution */aL := last actor in si4aF := first actor in si5AA(G, ai.end)6AC(G, ci.pre, aL, ai.end, 1,CNT(aL, si), 0)7AC(G, ci.pro, ai.end, aF ,CNT(aF , si), 1,CNT(aF , si))8

/* To control decision states */Ω, pos← getDecisionStates(G, s′i, s′0, · · · , s′n \ s′i)9Ω← reduceDecisionStates(Ω)10Ω← foldDecisionStates(Ω, µi)11foreach ωj ∈ Ω do12AA(G, ai.ωj )13foreach ak ∈ ∆j do14

if ak is the actor of choice then15AC(G, ci.akωj , ak, ai.ωj , 1, CNT(ak, si),16AFT(ak, pos[ωj ], si) )

else17AC(G, ci.akωj , ai.ωj , ak, CNT(ak, si), 1,18BEF(ak, pos[ωj ], si))

a0

a1c0

a2c1

2

3

Fig. 5. An example SDFG.

from that PSOS can proceed its firings beyond the currentPSOS iteration. Lines 4-8 in Algorithm 1 are used to controlthis undesirable actor enabling. This part of the algorithm adds(per PSOS) one actor and two channels to create a dependencybetween the last and first actor appearing in the PSOS. Theadded components limit, within one PSOS iteration, the firingof the first actor in the PSOS (i.e., aF ) to the count of actor aF(i.e., CNT (aF , si)) in one iteration of the PSOS si. The nextiteration of the PSOS si can only commence if the last actor inPSOS si (i.e., aL) fires CNT (aL, si) times in one iteration ofthe PSOS si. In other words, the next iteration of a PSOS canonly commence after the completion of the current iterationof this PSOS. In Figure 4, actor a0.end and channels c0.preand c0.pro are added to prevent any inter-iteration executionin PSOS s0. Actor a1.end and channels c1.pre and c1.pro areadded to prevent any inter-iteration execution in schedule s1.These elements are shown in our example in blue in Figure 4.

E. Decision states

This sub-section presents the third step of DSM. It firstdefines the concept of a decision state and then proceeds withthe algorithm for identifying decision states; after explainingtwo optimization steps, it ends with the technique to enforcethe appropriate schedule decisions.

1) Concept: Multiple different actors mapped to a singleprocessor may be enabled in a specific state. Here, we describe

a2

a0ω0 ω1

a1ω2

a1ω3

a2ω4

a1ω5

P1

P0

a2 a2 a1a2

Fig. 6. The state space of the SDFG of Figure 5 when PSOSs s′′0 = 〈a0〉∗and s′′1 = 〈(a1)2a2a1a2〉∗ are used.

such situations in an SDFG execution. Consider the SDFG inFigure 5. Assume that a0 is mapped to processor P0 withPSOS s′′0 = 〈a0〉∗ and a1 and a2 are mapped to processorP1 with PSOS s′′1 = 〈(a1)2a2a1a2〉∗. For brevity, we onlydiscuss the actors mapped to processor P1. The correspondingstate space - for one SDFG iteration - when executing ourexample SDFG using the PSOSs s′′0 and s′′1 is visualized inFigure 6. In this figure, the actors mapped to processor P0

(P1) are surrounded by a square (circle). Auto-concurrencyand inter-iteration execution are excluded using the constructsintroduced in Section V-C and Section V-D resp. The periodicbehavior of the PSOSs is obvious from the state space whereone SDFG iteration moves the graph to its initial state, i.e.,ω0 (see Figure 6). There are some states in Figure 6 in whichmore than one actor is enabled (e.g., ω1 − ω5) on processorP1. In such a situation, the execution related to those actorscan deviate from the specified PSOS. We use the followingdefinition to formalize such a situation.

Definition 10. (DECISION STATE) Consider the PSOS si asa schedule for actors Ai ⊆ A and an execution σ of an SDFG(A,C) which satisfies PSOS si. A state ωj ∈ σ is a decisionstate iff multiple different actors from Ai are enabled in ωj .

We use Ω to refer to the finite set containing all decisionstates occurring in the execution of an SDFG up-to oneiteration of a PSOS. The following terminology describes theenabled actors in a decision state.

Definition 11. (OPPONENT ACTOR SET) Let ωj ∈ Ω be adecision state for PSOS si. The opponent actor set ∆j of thedecision state ωj is a finite set which contains all actors thatare enabled in decision state ωj and that belong to Ai.

The finite set ∆j represents the opponent actors in thedecision state ωj ∈ Ω. One of the enabled actors in a decisionstate ωj , in line with the given PSOS si, should be selectedto fire. The following is used to describe such an actor.

Definition 12. (ACTOR OF CHOICE) Consider the PSOS siwhich schedules actors Ai ⊆ A and the opponent actor set∆j of the decision state ωj in an execution σ of the SDFGG(A,C) which satisfies si. An actor ac ∈ ∆j is called theactor of choice of the decision state ωj iff the firing of actor acin state ωj is a necessity for the execution σ in order to satisfythe PSOS si. Since a PSOS specifies a fixed firing order, therecan only be a single actor of choice in any decision state.

Lines 9-18 in DSM show how we deal with non-deterministic execution due to decision states. DSM modelsthe given PSOSs one-by-one iteratively. The ordering of

Page 9: purl.tue.nlpurl.tue.nl/564398305728663.pdf · Schedule-Extended Synchronous Dataflow Graphs Morteza Damavandpeyma, Sander Stuijk, Twan Basten, Marc Geilen and Henk Corporaal This

TECHNICAL REPORT, ESR-2013-01, EINDHOVEN UNIVERSITY OF TECHNOLOGY, MAY 2013 7

Algorithm 2: Get Decision Statesinput : SDFG G, PSOS sc, PSOSs so1, · · · , sonoutput: Decision state set Ωoutput: relative positions pos

ωj ← the initial state of G1for i← 1 to |sc| do2ωj ← maxExec(G, ωj , so1, · · · , son)3if sizeof(enabledActors(G, ωj , sc)) >1 then4pos[ωj ]← i5Ω← Ω ∪ ωj6∆j ← enabledActors(G, ωj , sc)7

ωj ← fire(G, ωj , sc[i])8

a1

a0ω0 ω1

a1ω2

a1ω3

a1ω4

a2ω5

a1ω6

a3

a1 ω7

a3

a3ω8

a1

a3ω9

a1

a3ω10

a1

ω11ω12

a2ω13

a3ω14

a3a3

P1P0

Fig. 7. The state space of the SDFG of Figure 1 when the PSOS s1 =〈(a1)5(a3)3a1(a3)3〉∗ is the schedule of interest (i.e., sc) in Algorithm 2.

PSOSs in DSM does not have any impact on the final behavior.In each iteration of the for-loop in line 3, we enforce theexecution of the actors in the current schedule of interest(i.e., schedule si) to follow schedule si. The next sub-sectionexplains how decision states of the schedule of interest areextracted. At the same time, a value is preserved for eachdecision state that captures the relative position of that decisionstate with respect to the beginning of the schedule of interest;the notation pos[ωj ] refers to this position for ωj . For example,in the SDFG of Figure 1, pos[ω6] = 5 since the relativeposition of ω6 with respect to the beginning of the scheduleof interest (i.e., s1) is 5 (in Figure 7 consider 4 firings relatedto s1 have been occurred before ω6 and the 5th actor firingrelated to s1 is going to happen in ω6). For each ωj ∈ Ωextracted for si, DSM adds an actor (ai.ωj

in line 13) andone channel between the new actor ai.ωj

and each opponentactor in the set ∆j (lines 14-18 in Algorithm 1). The elementsadded in each decision state (e.g., ωj) postpone the executionof the actors in ∆j \ ac to the state after decision state ωj .Hence, ac (i.e., the actor of choice) is the only actor whichcan be fired in the state ωj .

2) Decision state identification: Algorithm 2 shows ourtechnique to detect all decision states within PSOS sc. Assumesc is a PSOS for the actors mapped to processor Pc. Schedulesso1 · · · son are PSOSs for the other actors of the SDFG mappedto the other processors (denoted by Po1 · · ·Pon). The outputof Algorithm 2 is a set that contains all decision states forPSOS sc. This algorithm also returns the relative positionsof decision states with respect to the beginning of sc. InAlgorithm 2, the input schedules are normalized PSOSs. Thefunction normalize (in line 2 of Algorithm 1) normalizesthe input PSOSs. The function returns the normalized PSOSsalong with their normalization factors. The normalized PSOSs′x can be achieved by repeating µx times the input PSOS sx(i.e., s′x = (sx)µx ). µx is the normalization factor of sx andcan be calculated by dividing the repetition vector entry of an

arbitrary actor in sx by the count of that actor in the PSOSsx (in our running example, µ0 and µ1 are 1).

After normalizing the input schedules, all situations that canlead to multiple actors (mapped to the same processor) beingready to fire must be discovered. An actor in the schedule ofinterest sc could be affected by the execution of an actor in theother schedules as well as by another actor in sc. Processorscan run at different clock rates; these differences and inter-processor dependencies cause variation in the number oftokens on the inter-processor channels originating from theactors mapped to the other processors to the actors mappedto the processor of interest (i.e., Pc). The number of tokenson the input channels of an actor determines whether an actoris enabled or not. To determine any possible actor enablingwithin sc, a necessary and sufficient number of tokens onall inter-processor channels entering to the actors mappedto processor Pc must be considered. We will now explainwhat necessary and sufficient number of tokens means in ouralgorithm. Each iteration of the schedule of interest sc requiresthat the actors mapped to the other processors are fired up-to at most their repetition vector entry values. Hence, onlyexecuting one iteration of the other schedules so1 · · · son isenough to provide a necessary number of tokens on inter-processor channels entering to the actors mapped to processorPc. More than one iteration for the other schedules so1 · · · sonmay be feasible; this may cause an actor in sc to be enabledmore than its count in one iteration of sc. The inter-iterationprevention constructs introduced in Section V-D control thisundesired actor enabling. So, we only extract decision stateswithin one iteration of the normalized schedule to provide asufficient number of tokens.

Also, DSM does not impose any limitation between PSOSssince no dependency is created between actors mapped todifferent processors. PSOSs can independently be iterated ifthe dependencies in the SDFG allow that. We allow the actorson the other processors to be executed (according to theirschedules) as much as they can; the corresponding executionis named maximal execution. The maximal execution will stopat one point either due to a dependency on the actors on theprocessor Pc or because one PSOS iteration is completed. TheSDFG state (denoted by ωj in Algorithm 2) should be keptduring the operation of the algorithm. The maximal executionis represented by the function maxExec in Algorithm 2. Afterone maximal execution, the number of tokens on the inter-processor channels entering into the actors on the processorPc determines any possible enabled actor. The preserved state(i.e., ωj) will be added to the decision state set (Ω) if morethan one actor on the processor Pc is enabled at this state(line 6 in Algorithm 2). The current position (i.e., i) in theschedule of interest sc is also preserved for the discovereddecision state (see line 5 in Algorithm 2). All enabled actorswill be recorded as opponent actors of the state ωj (line 7 inAlgorithm 2). The execution of the actors on the processorPc is continued by executing the enabled actor in line with scto determine all possible decision states (line 8 in Algorithm2). The function fire(G,ωj , sc[i]) fires the actor at the ith

position in the PSOS sc. The process is repeated until a fulliteration of the PSOS has been examined. In the end, the set

Page 10: purl.tue.nlpurl.tue.nl/564398305728663.pdf · Schedule-Extended Synchronous Dataflow Graphs Morteza Damavandpeyma, Sander Stuijk, Twan Basten, Marc Geilen and Henk Corporaal This

TECHNICAL REPORT, ESR-2013-01, EINDHOVEN UNIVERSITY OF TECHNOLOGY, MAY 2013 8

a0 a1c0

4a2c1

c2 3

3

4

2 8

2

Fig. 8. A third example SDFG.

Ω contains all possible decision states when executing sc.Figure 7 depicts the resulting state space from applying

Algorithm 2 on the SDFG of Figure 1 when s1 is the scheduleof interest (i.e., sc). In the SDFG of Figure 1, five consecutivedecision states (Ω = ω5 · · ·ω9) have been found for s1 andno decision state has been found for s0 (see Figure 7).

3) Redundant decision states: Here, we explain an opti-mization proposed in DSM to remove unnecessary decisionstates. DSM adds some components (per decision state) tocreate a dependency from the actor of choice of a decisionstate to the other opponent actors of that decision state. Sucha dependency delays the firing of those opponent actors tothe state after the decision state. The added components areexplained in detail in Section V-E5.

It is possible to have several consecutive decision stateswhich are delaying the firing of an actor to several stateslater. For example, three consecutive decision states (ω7−ω9)exist in Figure 7 that all delay the firing of a1; the addedcomponents in ω7 delay the sixth firing of a1 to ω8; the addedcomponents in ω8 delay the sixth firing of a1 to ω9; and soon. The latest decision state in the sequence of decision statesω7−ω9 is enough to delay the firing of a1 to ω10. Hence, thedecision states ω7−ω8 are redundant and can be removed fromthe decision state set Ω. The function reduceDecisionStatesis responsible for removing redundant decision states. Notethat it would be possible to perform this reduction duringthe decision state identification step. This optimization cansignificantly reduce the number of extra components in thefinal SDFG. Decision state ω5 is also redundant according toour optimization. So, only two decision states ω6 and ω9 arenecessary to model s1 in the SDFG of Figure 1.

4) Decision state folding: In Algorithm 1, the input PSOSsare normalized to find all decision states. The normalizationof PSOSs is required to explore sufficient states of an SDFG.Consider PSOSs s2 = 〈a0〉∗ and s3 = 〈a2 a1〉∗ for oursecond example SDFG in Figure 8. To obtain normalizedPSOSs, µ2 and µ3 must be equal to 3 and 4 respectively. Thisleads to the following normalized PSOSs: s′2 = 〈(a0)3〉∗ ands′3 = 〈(a2 a1)4〉∗. Decision state identification for s′3 results in5 decision states.

(a2−

)(a1a2

)︸ ︷︷ ︸1st 2nd

(a2a1

)(a1a2

)︸ ︷︷ ︸3rd 4th

(a2a1

)(a1a2

)︸ ︷︷ ︸5th 6th

(a2−

)(a1−

)︸ ︷︷ ︸7th 8th

shows

the corresponding execution of s′3. In construct(axay

), ax is

the enabled actor in line with the PSOS and ay is the otherenabled actor if any at all. In this execution, the 1st, 3rd, 5th

and 7th states are similar in behavior since the same actor(i.e., a2) is expected to fire in all of those states.

Modeling a repetitive behavior for a PSOS si, also modelsthis behavior for its normalized PSOS (i.e., s′i = (si)

µi ). Usingthis property, we can merge decision states appearing in all µirepetitions of the PSOS si. We call this optimization decisionstate folding (line 11 in Algorithm 1). Folding groups thesimilar states. In our example, the 1st, 3rd, 5th and 7th states

are grouped and represented with one state. Similarly, the 2nd,4th, 6th and 8th states are grouped. So, the above executionshrinks to

(a2a1

)(a1a2

). After grouping all similar states in the

original execution into a representative state, it is considereda decision state if any of the group members is a decisionstate. In practice, a decision state in a state of the new foldedexecution will be considered as a decision state for each of theequivalent states in the original execution. This cannot violatethe execution according to the input PSOS because DSMensures that only the actor of choice executes in a decisionstate. This optimization could reduce the number of decisionstates up to µi times in a normalized PSOS s′i. The firingsrelated to those actors enabled in the last state except the actorof choice of that state are supposed to happen in subsequentPSOS iterations; this is already ensured by preventing inter-iteration execution (see Section V-D). Hence, after folding,the last state can be ignored as a decision state. In our thirdexample, decision state folding reduces the number of decisionstates from 5 to 1 for the PSOS s3.

5) Enforcing a schedule in decision states: In our firstexample SDFG, only two actors are enabled in decision stateω6 (i.e., ∆6 = a1, a3) (see Figure 7). Actor a1 is the actorof choice in ω6 and a3 is the only opponent actor whoseexecution should be delayed to the state after ω6. To enforcefiring of a1 and to prevent firing of a3 in ω6, DSM creates adependency from a1 to a3 by adding actor a1.ω6 and channelsc1.a1ω6

and c1.a3ω6. The rates and initial tokens related to the

new elements are set in such a way that the execution of thegraph in other states are not affected. The actor a1.ω6

is onlyresponsible for decision state ω6. So, a1.ω6 needs to only fireonce in an iteration. For this purpose, the port rates of a1.ω6 onits channels (i.e., c1.a1ω6

and c1.a3ω6) are set to 6. The added

dependency channels from the newly added actor in decisionstate ωj (e.g., a1.ω6

in ω6) to the opponent actors which are notthe actor of choice (e.g., a3 in ω6) only provide enough tokensfor their execution in states ω0 − ωj−1 (e.g., 0 tokens for a3in ω0−ω5); these actors cannot be enabled due to the lack ofinitial tokens in the newly added channels in the correspondingdecision state (e.g., there will be no token in c1.a3ω6

in ω6).Hence, only the actor of choice amongst the opponent actorsof a decision state will be enabled in that state (e.g., only a1can fire in ω6). The delayed actors in a decision state (e.g.,ωj) will have sufficient tokens on the incoming channel fromthe newly added actor for that decision state (i.e., ai.ωj

) afterfiring of the actor of choice in ωj . For example, there willbe 6 tokens in channel c1.a1ω6

after the firing of a1 (i.e., theactor of choice) in decision state ω6; hence, the actor a1.ω6

immediately fires and then provides sufficient tokens for laterfirings of a3. So, the delayed actor in decision state ω6 (i.e.,a3) will no longer be blocked due to the absence of tokens inchannel c1.a3ω6

after the decision state ω6. The firing of actora1 after decision state ω6 produces 1 token in channel c1.a1ω6

and the firings of actor a3 after decision state ω6 consumes 6tokens from channel c1.a3ω6

; as a result, the number of tokensin the new channels are reset to the initial values at the end ofone iteration of the schedule s1. Hence, the periodic behavioris also achievable for the added components.

DSM also adds actor a1.ω9 and channels c1.a1ω9 and c1.a3ω9

Page 11: purl.tue.nlpurl.tue.nl/564398305728663.pdf · Schedule-Extended Synchronous Dataflow Graphs Morteza Damavandpeyma, Sander Stuijk, Twan Basten, Marc Geilen and Henk Corporaal This

TECHNICAL REPORT, ESR-2013-01, EINDHOVEN UNIVERSITY OF TECHNOLOGY, MAY 2013 9

a0 a2

c2

5

a1c0 c1

5

5a4a3c3 c4

2 3

Fig. 9. An example SDFG.

Algorithm 3: SAS Modeling (SASM)input : SDFG G(A,C), PSOS si =(α1)β1(α2)β2 ...(αn)βnoutput: G extended with schedule siadd a self edge with 1 initial token for each a ∈ Ai1for i← 1 to n do2

if αi is not an actor then3SASM (G, αi)4

AA(G, acnti )5/* adding a counter channel */

AC(G, ccnti , rightMost(αi), acnti , 1,6

RN(rightMost(αi), αβii ), 0)/* adding a limiter channel */

if i = n then7AC(G, climi , acnti , leftMost(α1), RN(leftMost(α1),8

αβ11 ), 1, RN(leftMost(α1), αβ11 ))else9AC(G, climi , acnti , leftMost(αn+1),10

RN(leftMost(αn+1), αβn+1n+1 ), 1, 0)

to the graph for the other decision state ω9. The componentsadded in decision state ω9 show similar behavior as thecomponents added in ω6.

VI. MODELING SINGLE APPEARANCE SCHEDULES

A well-known class of scheduling techniques are singleappearance schedules (SASs) in which each actor appearsexactly once in the LS form. This aspect makes SASs suitableto minimize code memory size. s2 = 〈(a0a1)5a2〉∗ is a PSOSand SAS for part of the SDFG (i.e., actors a0-a2) in Figure 9.

DSM is able to model any PSOS. However, more compactgraphs are possible for SASs. Algorithm 3 presents our singleappearance schedule modeling (SASM) technique. Similar toDSM, SASM adds some extra actors and channels to the origi-nal SDFG to model the given schedules. The original channelsand actors in the schedule-extended SDFG are preserved anddistinguishable from the newly added elements by any of ourtechniques. Hence, both our techniques preserve the originalstructure of an SDFG. This property can be beneficial whena resource allocation algorithm needs to be applied on theschedule-extended graph; a resource allocation algorithm caneasily distinguish an original actor (or channel) from an actor(or channel) which is added to model the schedules.

We know that each actor appears only once in a SAS;this property can help us to model a SAS in an SDFG ina smarter way than DSM does. An actor (or a nested innerLS) should be executed a specific number of times beforeanother actor (or another nested inner LS) starts executing.In s2 = 〈(a0a1)5a2〉∗, the nested inner LS (a0a1) must beexecuted 5 times before a2 starts firing. This type of executioncontrol can be handled using a counter construct. The key ideaof SASM is to implement a counter concept in the graph.

Later, we explain how we implement such counters to modelSASs in an SDFG. Similar to DSM, auto-concurrency can beeliminated by adding a self-edge with 1 initial token to eachactor in the SDFG (see line 1 in Algorithm 3). The rest ofAlgorithm 3 deals with implementing the counter concept.

Figure 10 shows the graph of the SDFG in Figure 9 whichmodels the schedule s2 using SASM. Schedule s2 has a nesteda0a1; this can be replaced with α01 to form a looped schedulerepresentation (i.e. s2 = 〈(α01)5a2〉∗ where α01 = a0a1). Acounter in SASM is implemented by one actor acnti (e.g.,acnt3 in Figure 10), a counter channel ccnti (e.g., ccnt3 ) anda limiter channel climi

(e.g., clim3). A counter in SASM has

two properties: first, it counts the number of times that elementαi (e.g., α01 above) is being executed; second, it limits theelement αi+1 (e.g., a2) to be executed to βi+1 times (e.g.,1 as the number of repetitions of a2 is one). The counterchannel ccnti (e.g. ccnt3 ) is placed from the rightmost actorin αi (e.g., actor a1 in α01) to the actor acnti (e.g., acnt3 );the production rate of the rightmost actor in αi on ccnti isset to 1 and the consumption rate of acnti on ccnti is setto the number of times that the rightmost actor in αi canbe executed in αi

βi (e.g., 5 as the number of times that a1can be executed in α01

5 is five). In Algorithm 3, the functionrightMost(αi) (leftMost(αi)) returns the rightmost (leftmost)actor in element αi (e.g. rightMost((a0a1)5) returns actor a1).The function RN(a,αiβi ) retrieves the count of a in elementαiβi (e.g. RN(a0,(a0a1)5) returns 5).The limiter channel climi (e.g., clim3 ) is placed from acnti

(e.g., acnt3 ) to the leftmost actor in the next element, i.e.,αi+1 (e.g., a2). SASM performs the placement of the counterconstructs in a circular way. In other words, the next elementof αj is considered to be α(j+1) mod n where j ∈ N∧ j ≤ nfor a LS (α1)β1(α2)β2 ...(αn)βn. The production rate ofacnti on climi

is set to the number of times that the leftmostactor in αi+1 can be executed in αi+1

βi+1 (e.g., 1 as thenumber of times that a2 can be executed in element a2) andthe consumption rate of the leftmost actor in the next element,i.e., αi+1, from climi

is set to 1. So, element αi+1 dependson actor acnti (because of climi

) and actor acnti depends onαi (because of the ccnti ); the βi executions of αi produceenough tokens on the counter channel ccnti and then actoracnti can be fired. The firing of actor acnti provides enoughtokens on limiter channel climi

to only allow βi+1 executionsfor the next element αi+1. Hence, by adding these componentswe enforce that αi+1 can be executed βi+1 times after αi isexecuted βi times.

The limiter channel of the counter construct added for thelast element (i.e., (αn)βn ) in a LS (α1)β1(α2)β2 ...(αn)βn isinitialized with some initial tokens to prevent a deadlock in thegraph. The number of initial tokens on the limiter channel isset to the count of the left most actor of α1 in the first elementof that LS (i.e., (α1)β1 ). Line 8 in SASM performs the tokeninitialization. Inter-iteration execution cannot happen becauseSASM always creates a dependency from the last actor in theschedule to the first actor in the schedule.

In Figure 10, the actor acnt3 is added to count the numberof times that the sequence a0a1 is executed. The consumptionrate of actor acnt3 on its input channel is 5; this means that

Page 12: purl.tue.nlpurl.tue.nl/564398305728663.pdf · Schedule-Extended Synchronous Dataflow Graphs Morteza Damavandpeyma, Sander Stuijk, Twan Basten, Marc Geilen and Henk Corporaal This

TECHNICAL REPORT, ESR-2013-01, EINDHOVEN UNIVERSITY OF TECHNOLOGY, MAY 2013 10

a0 a2

c25

a1c0 c1

5

5

5

5

5Inner loop

c lim1

clim2

ccnt1 1

a4a3c3 c4

acnt

2acnt

ccnt23

acnt

c lim4

4acnt

ccnt4

clim3

ccnt3

Fig. 10. SDFG of Figure 9 extended with s2 = 〈(a0a1)5a2〉∗ using SASM.

ay

ccnt

p qax acnt

clim

mnay

(b) p is equal to1

ax

cxy

n·q+mq

ayax

cxy

n+m·p p

(a) original form

(c) q is equal to1

i i

i

Fig. 11. Optimization of SASM.

after 5 executions of sequence a0a1 the next actor (i.e., a2)can be enabled. Also, the actor acnt3 limits the number oftimes that actor a2 should get enabled; this can be done bychoosing the value 1 as production rate of actor acnt3 on itsoutput channel. In other words, the actor a2 can only fireonce because of the limitation imposed by actor acnt3 . Theactor acnt1 (acnt2 ) is added to ensure the single execution ofactors a1 (a0) after the single execution of actor a0 (a1). Theactor acnt4 is added to ensure that the sequence a0a1 can beexecuted 5 times after the actor a2 is executed once.

SASM is applied recursively (line 4 in SASM) to model thenested LS αi. For example, SASM(G, a0a1) will be calledinside SASM(G, (a0a1)5a2); the result of the recursive call isshown in Figure 10 with a rectangle marking the inner-loop.

Some of the elements added by SASM can be removedwithout affecting the outcome. Consider Figure 11(a) whichcontains a counter actor and two channels that can be discardedin the following cases:• The counter actor acnti can be removed if rate p is equal

to 1. The counter actor and two channels in the originalform are replaced with channel cxy (see Figure 11(b)).

• The counter actor acnti can be removed if rate q is equalto 1. The counter actor and two channels in the originalform are replaced with channel cxy (see Figure 11(c)).

The newly replaced channel cxy is only necessary if there is noother equivalent channel in the original SDFG. The channels(p, q) and (p′, q′) which have the same source actor ax ∈ Aand sink actor ay ∈ A are equivalent if the equation Rate(p)

Rate(p′) =Rate(q)Rate(q′) = ω0(c)

ω0(c′)is true. Applying these optimizations on

Figure 10 replaces all components added by SASM by channelc01 (see Figure 12). The SDFG which models schedule s2in the SDFG of Figure 9 with DSM and the HSDFG-basedtechniques result in a graph with 10 (26) and 13 (21) actors(channels) resp. The SDFG which models the same schedulewith SASM only has 5 (9) actors (channels).

VII. CORRECTNESS OF THE PROPOSED TECHNIQUES

This section presents the theorems related to the correctnessof our proposed techniques. A schedule is encoded correctly

a0 a2

c25

a1c0 c1

5

5

c01

a4a3c3 c4

Fig. 12. SDFG of Figure 9 extended with s2 = 〈(a0a1)5a2〉∗ usingoptimized SASM.

if any execution of a schedule-extended graph satisfies theencoded schedule and if any execution of the original graphthat satisfies the given schedule is still possible in the schedule-extended graph. The proofs of the theorems are given in theappendices of this report.

Assume that the actors and channels added by DSM (orSASM) to model schedule si in SDFG G(A,C) are denotedby Asi and Csi , resp. G′(A′, C ′) is the SDFG that modelssi in G using DSM (or SASM) where A′ = A ∪ Asi andC ′ = C ∪ Csi .

Theorem 1 and Theorem 2 state the correctness of DSMin modeling a single PSOS for a sub-set of the actors of theSDFG. Applying the theorems once for each schedule to beencoded shows the correctness of Algorithm 1.

Note that actors and channels added to model a givenschedule do not affect actors that are not part of the schedule.Theorem 1 shows that any execution of the schedule-extendedgraph satisfies the modeled schedule. Firings of the addedactors need to be ignored, which is achieved by the statedcondition on the ordered lists resulting from the executions inthe schedule-extended and original graphs.

Theorem 1. Consider PSOS si as a schedule for actorsAi ⊆ A from SDFG G (A,C). Assume G′(A′, C ′) is theSDFG that models si in G using DSM. For any execution σ′

of G′(A′, C ′) it holds that σ satisfies si where it is assumedthat σ is the execution of G(A,C) with orderList(σ,A) =orderList(σ′, A).

Theorem 2 shows that no execution of the original graphis unnecessarily excluded in the schedule-extended graph. Inother words, any execution of the original graph that satisfies agiven schedule is still feasible in the schedule-extended graph.

Theorem 2. Consider PSOS si as a schedule for actorsAi ⊆ A from SDFG G(A,C). Assume G′(A′, C ′) is theSDFG that models si in G using DSM. For any execution σof G that satisfies si it holds that there is exactly one σ′ thatis an execution of G′(A′, C ′) such that orderList(σ,A) =orderList(σ′, A).

The following two theorems state the correctness of SASM.

Page 13: purl.tue.nlpurl.tue.nl/564398305728663.pdf · Schedule-Extended Synchronous Dataflow Graphs Morteza Damavandpeyma, Sander Stuijk, Twan Basten, Marc Geilen and Henk Corporaal This

TECHNICAL REPORT, ESR-2013-01, EINDHOVEN UNIVERSITY OF TECHNOLOGY, MAY 2013 11

They are very similar to the two theorems for DSM.

Theorem 3. Consider SAS si =(α1)β1(α2)β2 ...(αn)βn asa schedule for actors Ai ⊆ A from SDFG G (A,C). AssumeG′(A′, C ′) is the SDFG that models si in G using SASM.For any execution σ′ of G′(A′, C ′) it holds that σ satisfies siwhere it is assumed that σ is the execution of G(A,C) withorderList(σ,A) = orderList(σ′, A).

Theorem 4. Consider PSOS si as a schedule for actorsAi ⊆ A from SDFG G(A,C). Assume G′(A′, C ′) is theSDFG that models si in G using DSM. For any execution σof G that satisfies si it holds that there is exactly one σ′ thatis an execution of G′(A′, C ′) such that orderList(σ,A) =orderList(σ′, A).

VIII. EXPERIMENTAL RESULTS

In this section, we evaluate our techniques experimentally.We first explain the experimental setup. We then evaluateour techniques in terms of the sizes of the schedule-extendedgraphs, comparing our techniques to that of [12]. We furtherconsider the throughput analysis time when analyzing theschedule-extended graphs obtained by different techniques.Note that the throughput that is achievable for a given scheduleis independent of the way it is encoded. It is the analysistime itself that is of interest. Finally, we look at the accuracyof buffer sizing analysis. The accuracy of obtained bufferrequirements does depend on the way schedules are encoded.

A. Experimental setup

The DSM and SASM techniques have been inte-grated in the SDF3 [28] dataflow tool set, available athttp://www.es.ele.tue.nl/sdf3. We use a set of DSP and multi-media applications (see the first column of Table I) to assessour DSM and SASM techniques.

In our experiments, applications are bound to a multi-processor platform using the technique of [6]. A PSOS de-termines the actor firing order and as such it influences theenabled actors in a state; as a result, the number of decisionstates can be different for different PSOSs. The size of theschedule-extended graph using DSM depends on the numberof decision states in the given schedules. We use a listscheduling approach from [29] to determine PSOSs for theapplications. We use two different variations to verify DSMin different situations. The first list schedule uses forwardpriorities (Lfp) and the second one uses reverse priorities(Lrp). Actors closer to the inputs of the graph have higherpriority in the Lfp schedules compared to actors closer to theoutputs of the graph and vice-versa in Lrp schedules.

The scheduling technique presented in [30] is used to deriveSASs for our benchmark applications. The technique in [30]also minimizes the required buffer sizes when determininga SAS. However, the technique in [30] cannot directly beused for multi-processors. We have utilized the technique of[30] to find SASs for a multi-processor platform. Initiallythe binding technique from [6] is used to bind the SDFGto a multi-processor platform. Then, the technique of [30]is applied to the SDFG to derive a SAS for all actors in

the SDFG. This SAS is decomposed into some smaller SASsusing the binding information; each of the smaller SASs isa schedule for one processor in the platform. Consider anexample SDFG with 5 actors denoted by a0 − a4. Assumea0, a1 and a3 are bound to processor P1 and a2 and a4 arebound to processor P2. Applying the technique of [30] to thisimaginary SDFG gives the SAS s0 = 〈(a0(a1

2a2a34)3)2a4

5〉∗for the whole SDFG. This SAS can be decomposed using thebinding information to form a SAS for each of the processor inthe platform. Only considering actors bound to P1 in s0 resultsin s01 = 〈a0(a1

2a34)3〉∗ which is a SAS for the actors bound

to P1. Similarly, a SAS s02 = 〈a26a45〉∗ can be extractedfrom s0 to order actors bound to P2. This way we utilize thetechnique of [30] for multi-processor platforms. The optimalityof the generated schedules from the performance or buffersizing perspective is debatable. However, we use this adaptedSAS technique merely to provide some near-optimal inputs toevaluate our SASM schedule encoding technique versus theexisting technique. Our techniques do not affect the quality ofthe scheduling result itself.

B. Comparison on graph sizes

Table I contains the size of the schedule-extended graphsusing the HSDFG-based and DSM techniques to model Lfpand Lrp schedules for a single core platform (see the firsttwo rows of each application line). Using schedules generatedby Lfp, the number of decision states is less than when Lrpis used, except in the channel equalizer and MP3 playbackapplications. By using Lfp scheduling, actors closer to inputshave higher priority compared to actors closer to outputs.This leads to consecutive execution of an actor followed byconsecutive execution of another actor with lower priority andso on. Thanks to our optimization in DSM, considering onlyone decision state before a context switch will be sufficient(e.g., decision state ω9 in Figure 7) and the number of decisionstates can be reduced significantly. Usually SDFG actorscloser to outputs are dependent on actors closer to inputs;this dependency can prevent an actor from being executedconsecutively in a graph scheduled by Lrp. As a result, thenumber of context switches in a graph scheduled by Lrp willtypically be larger compared to Lfp. Hence, the effectivenessof the decision state optimization in DSM decreases and extraelements are required to model the schedules in the graph.The exceptions in the channel equalizer and MP3 playbackare due to the existence of a cycle in the SDFG; the cycle canincrease the number of context switches in the schedule andas a result, Lfp could result in the same or a higher numberof decision states in DSM compared to Lrp. However, DSMalways outperforms the HSDFG-based technique regardless ofthe input schedule in our experiments. The number of actors(channels) using DSM is 66% (71%) lower compared to theHSDFG-based technique on average and 22% (17%) lower inthe worst-case observed in our experiments. The average caserefers to the mean value of the obtained results and the worst-case reports the smallest graph size reduction (i.e., reductionin numbers of actors and channels compared to the HSDFG-based technique).

Page 14: purl.tue.nlpurl.tue.nl/564398305728663.pdf · Schedule-Extended Synchronous Dataflow Graphs Morteza Damavandpeyma, Sander Stuijk, Twan Basten, Marc Geilen and Henk Corporaal This

TECHNICAL REPORT, ESR-2013-01, EINDHOVEN UNIVERSITY OF TECHNOLOGY, MAY 2013 12

TABLE ISIZE OF THE SCHEDULE-EXTENDED SDFGS

Orig. size # actors (# channels) Reduction compared to the HSDFG-based techniqueBenchmark # actors (# channels) Schedule type HSDFG-based DSM SASM DSM SASM

H.263 dec. [10] 4 (6)Lfp 1190 (2973) 6 (14) NA 99% (99%) NALrp 1190 (2972) 598 (1198) NA 49% (59%) NASAS 1190 (2972) 598 (1198) 4 (12) 49% (59%) 99% (99%)

H.263 enc. [24] 5 (7)Lfp 201 (596) 7 (16) NA 96% (97%) NALrp 201 (499) 105 (212) NA 47% (57%) NASAS 201 (499) 106 (214) 5(15) 47% (57%) 97% (97%)

MP3 dec. [10] 14 (21)Lfp 911 (2849) 27 (61) NA 97% (97%) NALrp 911 (2327) 400 (807) NA 56% (65%) NASAS 911 (2327) 400 (807) 14 (44) 56% (65%) 98% (98%)

modem [1] 16 (35)Lfp 48 (115) 22 (63) NA 54% (45%) NALrp 48 (128) 35 (89) NA 27% (30%) NASAS 48 (128) 35 (89) 16 (61) 27% (30%) 66% (52%)

samplerate conv. [1] 6 (11)Lfp 612 (1639) 12 (29) NA 98% (98%) NALrp 612 (1784) 157 (319) NA 74% (82%) NASAS 612 (1865) 238 (481) 12 (31) 61% (74%) 98% (98%)

satellite rec. [25] 22 (48)Lfp 4515 (11638) 41 (108) NA 99% (99%) NALrp 4515 (12820) 1223 (2472) NA 72% (80%) NASAS 4515 (15270) 3673 (7372) 24(91) 18% (51%) 99% (99%)

MP3 playback [26] 4 (8)Lfp 10601 (37531) 5298 (10600) NA 50% (71%) NALrp 10601 (37529) 5296 (10596) NA 50% (71%) NASAS 10601 (37530) 5297 (10598) 6 (16) 50% (71%) 99% (99%)

bipartite [25] 4 (8)Lfp 73 (341) 8 (20) NA 89% (94%) NALrp 73 (359) 26 (56) NA 64% (84%) NASAS 73 (350) 17(38) 9(22) 76% (89%) 87% (93%)

channel eq. [27] 21 (40)Lfp 41 (100) 32 (83) NA 22% (17%) NALrp 41 (95) 27 (73) NA 34% (23%) NASAS 41 (93) 30 (79) 21(65) 27% (15%) 48% (30%)

TABLE IITHE THROUGHPUT ANALYSIS TIME (IN MILLISECONDS)

Type of the schedules / The used schedule modeling techniqueSAS Lfp Lrp

Benchmark # of processors HSDFG-based DSM SASM HSDFG-based DSM HSDFG-based DSMH.263 dec. 2 36 540 < 1 < 1 118 720 < 1 120 710 < 1H.263 enc. 2 380 < 1 < 1 690 < 1 400 < 1MP3 dec. 3 13 900 320 10 17 660 < 1 13 980 1 380modem 3 10 < 1 < 1 < 1 < 1 < 1 < 1samplerate conv. 3 3 970 110 < 1 3 140 < 1 3 880 < 1satellite rec. 2 not finished in 3 days 12 414 400 130 not finished in 3 days 280 not finished in 3 days 675 700MP3 playback 2 not finished in 3 days < 1 < 1 not finished in 3 days 10 780 not finished in 3 days < 1bipartite 2 30 < 1 < 1 40 < 1 50 < 1channel eq. 3 < 1 < 1 < 1 < 1 < 1 < 1 < 1

SASs are a suitable class of schedules that minimize codememory size. DSM is able to model any arbitrary schedulein an SDFG. SAS can be modeled using DSM; however, itis possible to consider the intrinsic property of SASs whenmodeling a SAS in an SDFG. Our second technique, SASM,uses the fact that each actor appears only once in the loopedschedule form. SASM models the counter concept in the graphin order to force actors to be executed a specific number oftimes. The third row of each application line in Table I containsthe size of the schedule-extended graphs using the HSDFG-based, DSM and SASM techniques to model SASs, generatedby the technique developed in [30].

SASM results in a schedule-extended SDFG with a limitednumber of extra actors and channels. For example, SASM onlyadds 2 (8) extra actors (channels) to the original graph of theMP3 playback application in order to model a SAS, whilethe HSDFG-based technique adds 1057 (37522) extra actors(channels) to model the same schedule. The graphs obtainedby SASM have 88% (85%) and 48% (30%) less actors (chan-nels) compared to the HSDFG-based technique on average andin the worst-case among the benchmark applications. Usingthe DSM technique to model the same SASs results in 46%(57%) and 27% (15%) less actors (channels) compared to the

HSDFG-based technique on average and in the worst-case.Our results confirm that the techniques proposed in this paperachieve a more compact schedule-extended graph comparedto the available technique.

C. Comparison on analysis times

The time required to perform an analysis on an SDFGdepends on the size of the graph and the number of cyclesin the graph. As an example, the throughput analysis of[9] is performed on the schedule-extended graphs using ourtechniques and the HSDFG-based technique. Our experimentsare performed to evaluate the impact of the graph size onthe analysis time of a common analysis technique. Note thatother techniques (e.g., YTO [31]) can be employed to calculatethroughput of HSDFGs, but the size of the schedule-extendedHSDFGs is such that our conclusions remain the same. Thebenchmark graphs are mapped onto multi-processor platformswith two or three processors. Table II contains the throughputanalysis times when SASs, list forward priority (Lfp) sched-ules and list reverse priority (Lrp) schedules are used as inputschedules. The results show the superiority of SASM overDSM and the HSDFG-based technique. Note that the SDFG

Page 15: purl.tue.nlpurl.tue.nl/564398305728663.pdf · Schedule-Extended Synchronous Dataflow Graphs Morteza Damavandpeyma, Sander Stuijk, Twan Basten, Marc Geilen and Henk Corporaal This

TECHNICAL REPORT, ESR-2013-01, EINDHOVEN UNIVERSITY OF TECHNOLOGY, MAY 2013 13

vld iq594 594

idct mc

Fig. 13. SDFG of H.263 decoder application.

to HSDFG conversion is fast; the numbers reported for theHSDFG-based technique in Table II are related to the run-time of the throughput analysis from [9]. In our experiment,the run-time of a throughput calculation for HSDFGs is longindependent of the analysis technique used (i.e., state-space[9], YTO [31], etc.).

D. Comparison on buffer sizes

To further analyze the effectiveness of our techniques, thebuffer sizing algorithm from [10] is applied to the schedule-extended SDFGs of the H.263 decoder and MP3 decoder.Figure 13 depicts the SDFG of the H.263 decoder. Besides thecompactness of the schedule-extended graph, our techniquespreserve the original structure of an SDFG (when ignoring theadded actors and channels), allowing accurate buffer sizing,which is not guaranteed for the state of the art technique. TheH.263 decoder is mapped to a platform with two processors.The actors vld and iq are mapped to the first processor witha PSOS 〈vld(iq)594〉∗ and the actor idct and mc are mappedto the second processor with a PSOS 〈(idct)594mc〉∗. Theanalysis time for buffer sizing on the schedule-extended H.263decoder is less than 1 ms when using DSM (or SASM) tomodel the schedules. The same analysis takes 1330 ms whenusing the technique from [12] to model the same schedulesin the same graph. Figure 14(a) shows the Pareto space ofthroughput and buffer size when modeling the schedules withDSM (or SASM) and the HSDFG-based technique [12]. In thisexperiment, the schedules are first modeled in the graph; then,the buffer sizing technique of [10] is applied. A single channelin an SDFG corresponds to a set of channels in the equivalentHSDFG. As a result, the buffer sizing technique cannot findthe minimal buffer size when applying it on the equivalentHSDFG. Our experiments show these inaccuracies. Applyingbuffer sizing on the graph which models the schedules usingthe technique from [12] results in 49% overestimation inrequired buffers compared to applying the same buffer sizingtechnique on the graph which models the schedules with one ofour techniques. Note that the maximal achievable throughputis independent of the way schedules are encoded. The analysisresults confirm this. Only the computed buffer sizes differ. Forinstance in both cases of Figure 14, the maximal throughputfor the given schedules is always achievable, also by usingthe HSDFG-based schedule modeling technique; the lattersuggests the need for larger buffers though. Figure 14(b)shows results for the MP3 decoder. We use the mappingand scheduling from [32] for a platform with 3 processors.The analysis time on the graph which models the scheduleusing one of our techniques is 594 ms while 141610 ms isrequired to perform the same analysis on the graph using thetechnique from [12]. Using the technique from [12] results

1189 1190 1782

HSDFGDSM/SASM

Thro

ug

hput

Buffer size (tokens)

1.59E-06

2.90E-06

(a) H.263 decoder

594 1936

HSDFGDSM/SASM

Thro

ug

hput

Buffer size (tokens)

2.35E-07

(b) MP3 decoder

Fig. 14. Pareto space of schedule-extended graphs modeled by DSM andHSDFG-based techniques (the scales of the two graphs are different).

in 226% overestimation in buffer size compared to using ourtechniques.

Modeling a PSOS in an SDFG using DSM requires ex-ecution of one complete SDFG iteration. The number ofstates in one iteration could be exponential in the number ofactors in the graph. However, for all real-world SDFGs usedin our experiments, the execution time of DSM is below 1ms. SASM also models SASs based on the structure of theschedules in their looped form; as each actor appears oncein a SAS, the complexity of SASM depends on the numberof actors in the graph. Similar to DSM, the execution timeof SASM is always below 1 ms in our experiments. Thecomplexity of our techniques relates to the length of the SDFGiteration and the number of processors in the platform (i.e.,|P |). Hence, the complexity of our techniques is bounded toO(|P | ·

∑a∈A γ(a)).

IX. CONCLUSION

We present two techniques, DSM and SASM, to modelPSOSs and SASs directly in an SDFG. The resulting graphsare much smaller (often much less than half the size) thangraphs resulting from the state of the art technique that firstconverts an SDFG to an HSDFG. This results in a speed-up of analysis techniques. Computing the trade-off betweenbuffering and throughput for multi-processor platforms, for ex-ample, becomes several orders of magnitude faster. Moreover,properties like buffer sizes can be analyzed more accurately.The techniques have been integrated in the SDF3 tool setavailable at http://www.es.ele.tue.nl/sdf3. This allows easyintegration of the techniques in multi-processor design flows.

ACKNOWLEDGMENT

We thank the reviewers for their constructive comments,which helped improving the presentation of the paper. Thiswork was supported in part by the Dutch Technology Foun-dation STW, project NEST 10346.

REFERENCES

[1] S. S. Bhattacharyya et al., “Synthesis of embedded software from syn-chronous dataflow specifications,” Journal of VLSI Signal Processing,vol. 21, pp. 151–166, 1999.

[2] S. Sriram et al., Embedded Multiprocessors: Scheduling and Synchro-nization, 2nd ed. CRC Press, 2009.

[3] P. Poplavko et al., “Task-level timing models for guaranteed performancein multiprocessor networks-on-chip,” CASES. ACM, 2003, pp. 63–72.

[4] M.-Y. Ko et al., “Compact procedural implementation in DSP softwaresynthesis through recursive graph decomposition,” SCOPES. ACM,2004, pp. 47–61.

[5] A. Bonfietti et al., “Throughput constraint for synchronous data flowgraphs,” CPAIOR. Springer-Verlag, 2009, pp. 26–40.

Page 16: purl.tue.nlpurl.tue.nl/564398305728663.pdf · Schedule-Extended Synchronous Dataflow Graphs Morteza Damavandpeyma, Sander Stuijk, Twan Basten, Marc Geilen and Henk Corporaal This

TECHNICAL REPORT, ESR-2013-01, EINDHOVEN UNIVERSITY OF TECHNOLOGY, MAY 2013 14

[6] S. Stuijk et al., “Multiprocessor resource allocation for throughput-constrained synchronous dataflow graphs,” in DAC, 2007, pp. 777–782.

[7] W. Liu et al., “Efficient SAT-based mapping and scheduling of homoge-neous synchronous dataflow graphs for throughput optimization,” RTSS.IEEE, 2008, pp. 492–504.

[8] Y. Yang et al., “Automated bottleneck-driven design-space explorationof media processing systems,” DATE. ACM, 2010, pp. 1041–1046.

[9] A. Ghamarian et al., “Throughput analysis of synchronous data flowgraphs,” ACSD. IEEE, 2006, pp. 25–36.

[10] S. Stuijk et al., “Throughput-buffering trade-off exploration for cyclo-static and synchronous dataflow graphs,” IEEE Transaction on Comput-ers, vol. 57, no. 10, pp. 1331–1345, 2008.

[11] M. Benazouz et al., “A new method for minimizing buffer sizes forcyclo-static dataflow graphs,” in ESTIMedia’10. IEEE, pp. 11 –20.

[12] N. Bambha et al., “Intermediate representations for design automationof multiprocessor DSP systems,” Design Automation for EmbeddedSystems, vol. 7, no. 4, pp. 307–323, 2002.

[13] M. Damavandpeyma et al., “Modeling static-order schedules in syn-chronous dataflow graphs,” in DATE’12. ACM, pp. 775–780.

[14] A.-P. Wang et al., “Buffer optimization and dispatching scheme forembedded systems with behavioral transparency,” ACM Transactions onDesign Automation of Electronic Systems, vol. 17, no. 4, pp. 41:1–41:26,Oct. 2012.

[15] M. Damavandpeyma et al., “Throughput-constrained DVFS for scenario-aware dataflow graphs,” in RTAS’13. IEEE.

[16] M. H. Wiggers et al., “Monotonicity and run-time scheduling,” EM-SOFT. ACM, 2009, pp. 177–186.

[17] D. Thiele et al., “Optimizing performance analysis for synchronousdataflow graphs with shared resources,” in DATE’12. ACM, pp. 635–640.

[18] R. Henia et al., “System level performance analysis - the SymTA/Sapproach,” IEE Proccedings Computers and Digital Techniques, vol.152, no. 2, pp. 148 – 166, mar 2005.

[19] L. Thiele et al., “Real-time calculus for scheduling hard real-timesystems,” in ISCAS’00, vol. 4. IEEE, 2000, pp. 101 –104.

[20] H. H. Wu et al., “A model-based schedule representation for heteroge-neous mapping of dataflow graphs,” HCW. IEEE, 2011, pp. 66–77.

[21] S. Bhattacharyya et al., Software Synthesis from Dataflow Graphs.Kluwer Academic Publishers, 1996.

[22] E. Lee et al., “Synchronous data flow,” Proceeding of the IEEE, vol. 75,no. 9, pp. 1235–1245, 1987.

[23] M. Geilen et al., “Minimising buffer requirements of synchronousdataflow graphs with model checking,” DAC ’05. ACM, 2005, pp.819–824.

[24] H. Oh et al., “Fractional rate dataflow model for efficient code synthe-sis,” Journal of VLSI Signal Processing, vol. 37, pp. 41–51, 2004.

[25] S. Ritz et al., “Scheduling for optimum data memory compaction inblock diagram oriented software synthesis,” ICASSP. IEEE, 1995.

[26] M. H. Wiggers et al., “Efficient computation of buffer capacities forcyclo-static dataflow graphs,” DAC. ACM, 2007, pp. 658–663.

[27] A. Moonen et al., “Practical and accurate throughput analysis with thecyclo static dataflow model,” MASCOTS. IEEE, 2007, pp. 238–245.

[28] S. Stuijk et al., “SDF3: SDF for free,” ACSD. IEEE, 2006, pp. 276–278.

[29] G. De Micheli, Synthesis and Optimization of Digital Circuits, 1st ed.McGraw-Hill, 1994.

[30] P. K. Murthy et al., “Joint minimization of code and data for synchronousdataflow programs,” Formal Methods in System Design, vol. 11, no. 1,pp. 41–70, 1997.

[31] N. E. Young et al., “Faster parametric shortest path and minimumbalance algorithms,” CoRR, vol. cs.DS/0205041, 2002.

[32] M. Geilen et al., “Worst-case performance analysis of synchronousdataflow scenarios,” CODES+ISSS. ACM, 2010, pp. 125–134.

Morteza Damavandpeyma received his B.Sc. andM.Sc. degrees in computer engineering from theUniversity of Tehran, Iran in 2006 and 2008, re-spectively. He is currently a Ph.D. candidate inthe Department of Electrical Engineering at theEindhoven University of Technology (TU/e). Hisresearch interests include modeling, analysis andsynthesis of embedded systems and multiprocessorSystem-on-Chips.

Sander Stuijk received his M.Sc. degree (withhonors) in Electrical Engineering in 2002 and hisPh.D. degree in 2007 from the Eindhoven Universityof Technology. He is currently an assistant professorin the Department of Electrical Engineering at theEindhoven University of Technology. His researchinterests include modeling methods and mappingtechniques for the design, specification, analysis andsynthesis of predictable hardware/software systems.

Twan Basten is a professor in the Electrical En-gineering department at Eindhoven University ofTechnology (TU/e), the Netherlands, where he chairsthe Electronic Systems group. He is also a se-nior research fellow of TNO Embedded SystemsInnovation in the Netherlands. He holds an M.Sc.and a Ph.D. in Computing Science from TU/e.His research interests include embedded and cyber-physical systems, dependable computing and com-putational models.

Marc Geilen is an assistant professor in the De-partment of Electrical Engineering at EindhovenUniversity of Technology. He holds an M.Sc. inInformation Technology and a Ph.D. from the Eind-hoven University of Technology. His research inter-ests include modeling, simulation and programmingof multimedia systems, multiprocessor systems-on-chip, networked embedded systems and cyber-physical systems, and multi-objective optimizationand trade-off analysis.

Henk Corporaal has gained an M.Sc. in Theo-retical Physics from the University of Groningen,and a Ph.D. in Electrical Engineering, in the areaof Computer Architecture, from Delft University ofTechnology. He is a professor in Embedded Sys-tem Architectures at the Eindhoven University ofTechnology (TU/e) in The Netherlands. His currentresearch projects are on low power single and multi-processor architectures, their programmability, andthe predictable design of soft- and hard real-timesystems.

Page 17: purl.tue.nlpurl.tue.nl/564398305728663.pdf · Schedule-Extended Synchronous Dataflow Graphs Morteza Damavandpeyma, Sander Stuijk, Twan Basten, Marc Geilen and Henk Corporaal This

TECHNICAL REPORT, ESR-2013-01, EINDHOVEN UNIVERSITY OF TECHNOLOGY, MAY 2013 15

APPENDIX ACORRECTNESS OF DSM

This section discusses the correctness of DSM in modeling asingle PSOS for a sub-set of the actors of the SDFG. The extraactors and channels added by our techniques to model a PSOS(e.g., si) are only placed between actors of that schedule (i.e.,Ai) and this is not imposing any restriction on the other actorsof the SDFG (i.e., actors in A\Ai). Hence, if we can model asingle PSOS in the SDFG, then we can model multiple PSOSsby applying the algorithm multiple times.

DSM adds some actors, denoted by Asi , and channels,denoted by Csi to model the PSOS si in SDFG G(A,C).G′(A′, C ′) is the SDFG that models si in G using DSM whereA′ = A ∪Asi and C ′ = C ∪ Csi .

The existence of a repetition vector for an SDFG G(A,C)ensures a balance between production and consumption rates.Hence, a balance equation can be defined as follow for eachchannel c = (p, q) ∈ C where p and q are ports of actors apand aq respectively: Rate(p) · γ(ap) = Rate(q) · γ(aq). Theexistence of a repetition vector γ for an SDFG ensures thateach balance equation related to a channel in the SDFG holds;under this situation the SDFG is consistent. The followingproposition shows the consistency of the schedule-extendedSDFG G′.

Proposition 1. The SDFG G′(A′, C ′) which models PSOS siin the consistent SDFG G(A,C) is consistent.

Proof. The SDFG G is consistent. Hence, a non-trivial repeti-tion vector γ exists for G. We need to show that a non-trivialrepetition vector γ′ exists for G′. The balance equation relatedto each self-loop added by DSM (in line 1 of Algorithm 1) toremove auto-concurrency is always valid because the sourceand destination actor of the self-loop channel are identicalwith production and consumption rate equal to 1. The ratesof the other channels added by DSM (for decision states orinter-iteration execution) share the following properties: (1)the newly added channel c ∈ Csi , which is added by DSM(in all lines 7, 8, 16 or 18 of Algorithm 1), is between anactor ap ∈ A(= A′ \ Asi ) and an actor aq ∈ Asi ; (2)the rate of the new channel on the side of the actor aq isequal to the count of the actor ap in one iteration of si (i.e.,CNT (ap, si)); (3) the rate of the new channel on the side ofthe actor ap is equal to 1. From these properties we concludethat aq ∈ Asi , which is added by DSM (in both line 6 and13 of Algorithm 1), fires only once in each iteration of si.Consider aq as the producer (see Figure 15(a)) of the newlyadded channel (i.e., (aq, ap) ∈ Csi ); then, the only firing ofaq in one iteration of si provides CNT (ap, si) tokens for allfirings of ap in one iteration of si. Now reconsider actor aq asthe consumer (see Figure 15(b)) of the newly added channel(i.e., (ap, aq) ∈ Csi ); as a result, all firings of actor ap in oneiteration of si provide CNT (ap, si) tokens for only one firingof aq .

Let set Ai be the set of the actors in the PSOS si. Considerthat each actor ap ∈ Ai appears r · γ(ap) times in the PSOSsi (r = u

v where u, v ∈ N) where the value r is identicalfor all actors in si. The appearance count of the actor ap ∈

Ai in si is represented by CNT (ap, si) and it is assumedto be equal to u

v · γ(ap). We can write the above statementas follow: CNT (ap, si) · v = γ(ap) · u. From this equationwe can conclude that u iterations of the SDFG G cause viterations of the PSOS si, leading to v firings of each of theactors added by DSM (i.e., actors from the set Asi ). So, in uiterations of the SDFG G (or v iterations of the PSOS si) thefollowing equation holds for each channel (ap, aq) ∈ Csi or(aq, ap) ∈ Csi (where ap ∈ Ai and aq ∈ Asi ):

CNT (ap, si)︸ ︷︷ ︸Rate(aq)

· v︸︷︷︸γ′(aq)

= 1︸︷︷︸Rate(ap)

· γ(ap) · u︸ ︷︷ ︸γ′(ap)

(1)

The entries of the repetition vector of the schedule-extendedgraph can be obtained using Equation 1. So a non-trivialrepetition vector γ′ can be found for the schedule-extendedSDFG G′. The relation between the repetition vector of theschedule-extended SDFG and the repetition vector of theoriginal graph is as follows: γ′(aq) (where aq ∈ Asi ) is equalto v and γ′(ap) (where ap ∈ A) is equal to γ(ap) · u.

The following proposition states that inter-iteration execu-tion for actors of a PSOS si is impossible in the SDFG G′

which extends the original SDFG G with PSOS si using DSM.

Proposition 2. PSOS inter-iteration execution is impossiblefor any actor appearing in PSOS si = 〈α1α2 . . . αn〉∗ inthe SDFG G′(A′, C ′), which models PSOS si in the SDFGG(A,C) using DSM.

Proof. Let set Ai be the set of the actors in the PSOS si. In oneiteration of a PSOS si, an actor ap ∈ Ai could be enabled moreoften than its designated number (i.e., CNT (ap, si) times).DSM prevents this by creating a dependency from the lastactor appearing in si (i.e., actor aL = αn) to the first actorappearing in si (i.e., actor aF = α1) (see lines 4-8 in DSM).This dependency is created by inserting a new actor ai.end andtwo channels ci.pre and ci.pro. The source (destination) actorof the channel ci.pre is aL (ai.end) with rate 1 (CNT (aL, si)).So, one firing of ai.end needs CNT (aL, si) tokens availablein ci.pre; for this purpose actor aL should fire CNT (aL, si)times. The destination actor of the channel ci.pro is aF ; theconsumption rate of aF from ci.pro is 1. The source actorof the channel ci.pro is ai.end; the production rate of ai.endto ci.pro is CNT (aF , si). So the firing of aF related to oneiteration of PSOS si needs CNT (aF , si) tokens available inci.pro; for this purpose actor ai.end should fire once. Thechannel ci.pro is initialized with CNT (aF , si) number oftokens; this number of tokens provides enough tokens for actoraF to fire CNT (aF , si) times in one iteration of PSOS si. Thesubsequent firing of actor aF depends on the firing of the actorai.end and the firing of the actor ai.end demands CNT (aL, si)times a firing of actor aL. Hence, the firing of aF belonging

CNT(ap,si)1ap aq

(a) aq is producer

CNT(ap,si)1ap aq

(b) aq is consumer

Fig. 15. Extra actor aq added by DSM in different situations.

Page 18: purl.tue.nlpurl.tue.nl/564398305728663.pdf · Schedule-Extended Synchronous Dataflow Graphs Morteza Damavandpeyma, Sander Stuijk, Twan Basten, Marc Geilen and Henk Corporaal This

TECHNICAL REPORT, ESR-2013-01, EINDHOVEN UNIVERSITY OF TECHNOLOGY, MAY 2013 16

to the subsequent iteration of si can be performed only afteraL finishes all of its firings belonging to the current iterationof si (i.e., after completion of the current iteration of si). Thisholds because actor aF is the first actor which should be firedin an iteration of si and other actors in si cannot get enabledbefore the first firing of aF . This second fact is guaranteed byadding decision state constructs (i.e., lines 12-18 in DSM) forany possible decision state and Proposition 6 below.

Even after eliminating inter-iteration execution from theSDFG, multiple actors from a schedule may be enabled inan SDFG iteration. We call such a state a decision state. Thefollowing proposition shows that analyzing only one SDFGiteration is enough to identify all possible decision states.

Proposition 3. Executing an SDFG G(A,C) for one iterationis enough to find all decision states within a PSOS si.

Proof. Let the set Ai be the set of the actors in si andAo = A \ Ai the remaining actors in A. Consider inter-processor channels, which are defined as channel originat-ing from the actors in Ao to the actors in Ai denoted byCipc = (ap, aq) ∈ C|ap ∈ Ao ∧ aq ∈ Ai. The executionof an actor ap ∈ Ao where (ap, aq) ∈ Cipc up-to its entry inthe repetition vector of the SDFG produces γ(ap) ·Rate(ap)tokens in the corresponding channel (ap, aq) ∈ Cipc. Actoraq ∈ Ai consumes those produced tokens within one iterationof the normalized PSOS si (because γ(ap) · Rate(ap) =γ(aq)·Rate(aq)). Hence, actors in Ai can receive the requiredtokens from all inter-processor channels Cipc for one iterationof the normalized PSOS si. This means that all possibledecision states related to PSOS si are detectable.

Actors in Ao could possibly fire more than the numbermentioned above (i.e., corresponding value in vector γ) if thechannel dependencies in the SDFG allow additional firings ofthese actors. This could cause more than enough tokens (forone iteration of the normalized PSOS si) in channels Cipc.This could enable an actor in Ai more than its designatednumber in one iteration of the normalized PSOS si. To avoidthis undesirable actor enabling, some constructs are added tothe graph to prevent inter-iteration execution (see Proposition2). As a result, extra tokens produced by further firing ofthe actors Ao cannot enable any actor in Ai more than itsdesignated value in one iteration of the normalized PSOS si.So, executing actors in the SDFG up-to their repetition vectorentry (i.e., one SDFG iteration) is enough to find all decisionstates within si.

The identified decision states may be redundant. Proposition4 discuses the proposed decision state reduction in the DSM.

Proposition 4. Let σ be an execution of an SDFG (A,C) anda PSOS si which schedules actors Ai ⊆ A. In the execution σ,consider y consecutive decision states ωx+1, ωx+2, · · · , ωx+y .Assume that ao ∈ Ai is an opponent actor in each of thesedecision states but not the actor of choice in any of them.It is sufficient to only consider the last decision state ωx+yto postpone the firing of the opponent actor ao in thoseconsecutive decision states to the state ωx+y+1 ∈ σ.

Proof. The purpose of the components added by DSM (i.e.,lines 13-18 in Algorithm 1) in a decision state ωj ∈ Ω is toprevent any opponent actor ao which is not the actor of choicein decision state ωj from getting enabled in that state and assuch to postpone that firing to the state ωj+1. It is assumed thatthe opponent actor ao is enabled in the consecutive decisionstates ωx+1, ωx+2, · · · , ωx+y . Suppose that the opponent actorao was fired e times before the first decision state (i.e., ωx+1)where 0 ≤ e < γ(ao). The components added by DSM in thelast decision state ωx+y prevent the opponent actor ao fromgetting enabled for the (e+1)th time in decision state ωx+y . Asa result, the opponent actor ao can also not be enabled in statesωx+1, ωx+2, · · · , ωx+y−1 after adding DSM components fordecision state ωx+y . So, the components added by DSM in thelast decision state of consecutive decision states are enough toprevent the firing of an opponent actor which is not the actorof choice in those consecutive decision states.

Decision state folding overlaps the consecutive repetitionsof the designated PSOS in an SDFG iteration to reduce thenumber of decision states. The following proposition statesthat decision state folding does not dismiss any decision state.

Proposition 5. Consider PSOS si = 〈α1α2 . . . αn〉∗ for thesub-set of actors Ai from SDFG (A,C); assume si is repeatedµi times to form the corresponding normalized PSOS (s′i =〈(si)µi〉∗) to identify decision states related to PSOS si. Afterdecision state folding all decision states are preserved.

Proof. Normalization can be done by repeating si µitimes (µi is the normalization factor of si). Decision stateidentification is applied to the normalized PSOS s′i =〈(si)µi〉∗ = 〈α1α2 . . . αn︸ ︷︷ ︸

1st

α1α2 . . . αn︸ ︷︷ ︸2nd

· · ·α1α2 . . . αn︸ ︷︷ ︸µi

th

〉∗. De-

cision point constructs are added based on the given PSOS si.Folding groups the identified decision states of the normalizedPSOS s′i in the following manner: the state related to the firingof an actor αj in si is considered as decision state if a decisionstate is found at least in one of the µi states that αj fires inthe execution related to s′i.

The corresponding state related to firing of the actor αj in siwhich is considered as a decision state imposes a decision stateto all µi corresponding states of actor αj in s′i. So, no decisionstate will be lost after decision state folding and the only effectis introducing unnecessary decision state controlling. We needto show that this extra controlling does not affect the executionof the SDFG according to the schedule. The construct addedin a decision state is used to guarantee execution of the actorof choice of that decision state. Even if a state is not a decisionstate, the added constructs for that unnecessary decision stateonly ensure that the only enabled actor in that state (i.e., theactor of choice) can be fired. This does not violate the actorfiring order according to the PSOS in that state.

DSM adds some components per decision state to enforcethe firing of the enabled actor in a decision state which is inline with the given PSOS (i.e., actor of choice). Proposition6 explains how those components can guarantee the firing ofthe actor of choice in the decision state.

Page 19: purl.tue.nlpurl.tue.nl/564398305728663.pdf · Schedule-Extended Synchronous Dataflow Graphs Morteza Damavandpeyma, Sander Stuijk, Twan Basten, Marc Geilen and Henk Corporaal This

TECHNICAL REPORT, ESR-2013-01, EINDHOVEN UNIVERSITY OF TECHNOLOGY, MAY 2013 17

Proposition 6. The PSOS si is a schedule for actors Ai ⊆ Afrom SDFG (A,C). Let ωj ∈ Ω be a decision state within PSOSsi and ∆j ⊆ Ai the set of the opponent actors in decisionstate ωj . The actor of choice in decision state ωj (denotedby actor ac) is the only actor which can fire in decision stateωj among all actors in ∆j after applying DSM. Also, oneiteration of the resulting schedule-extended SDFG resets thetokens to their initial positions (i.e., the periodic behavior ofthe original SDFG is also preserved).

Proof. We need to show that the opponent actors ∆j\ac cannot be enabled in the decision state ωj after applying DSM.Accordingly, the actor of choice ac in the decision state ωj isthe only actor which can fire in ωj among all actors in ∆j .

In the DSM technique, actor ai.ωjis added for each de-

cision state ωj ∈ Ω within the PSOS si. An opponent actorak ∈ ∆j \ ac in the decision state ωj is dependent on thenew actor ai.ωj because of the added channel ci.akωj ; thischannel is initialized with BEF (ak, pos[ωj ], si) tokens. Therate of the channel ci.akωj

on the side of ak (ai.ωj) is equal

to one (CNT (ak, si)). The new actor ai.ωjis also dependent

on the actor of choice ac of the decision state ωj becauseof the added channel ci.acωj

; this channel is initialized withAFT (ac, pos[ωj ], si) tokens. The rate of the channel ci.acωj

on the side of ac (ai.ωj ) is equal to one (CNT (ac, si)).An opponent actor ak ∈ ∆j \ ac fires

BEF (ak, pos[ωj ], si) times before decision state ωjand every time the opponent actor ak consumes one tokenfrom ci.akωj

. So, the BEF (ak, pos[ωj ], si) firings of akbefore ωj consume all tokens which were available in channelci.akωj

. Hence, the opponent actor ak ∈ ∆j \ ac cannotfire in ωj . Firings of the opponent actor ak ∈ ∆j \ ac fromdecision state ωj onward will be dependent on the firingof ai.ωj to provide the required tokens in channel ci.akωj .As mentioned before, ai.ωj

depends on the actor of choiceac. So, the opponent actor ak ∈ ∆j \ ac cannot fire fromdecision state ωj onward, unless the actor of choice ac ofthe decision state ωj fires. Hence, the actor of choice ac isthe only actor among the other opponent actors in ωj whichcan fire. The actor of choice ac is fired BEF (ac, pos[ωj ], si)times by state ωj and this results in BEF (ac, pos[ωj ], si)tokens being produced in channel ci.acωj

; as channelci.acωj

is initialized with AFT (ac, pos[ωj ], si)tokens, the number of tokens in this channel isBEF (ac, pos[ωj ], si)+AFT (ac, pos[ωj ], si) = CNT (ac, si)after firing ac in decision state ωj . So, there will be sufficienttokens (for one firing of ai.ωj

) on the only channel leading toai.ωj

after firing of the actor of choice ac in decision state ωj .Then, firing of ai.ωj

consumes all CNT (ac, si) tokens thatare present in channel ci.acωj and it produces CNT (ak, si)tokens in channel ci.akωj (ak ∈ ∆j \ ac); therefore, theopponent actors ∆j \ ac are not any more dependent onai.ωj

in the remainder of the current iteration of si.The firings of the opponent actor ak ∈ ∆j \ ac

after decision state ωj consumes AFT (ak, pos[ωj ], si)tokens from channel ci.akωj

; as a result, at theend of the PSOS iteration, the number of tokenson this channel returns to its initial value which is

BEF (ak, pos[ωj ], si) (because BEF (ak, pos[ωj ], si) =CNT (ak, si)−AFT (ak, pos[ωj ], si)). The actor of choice acfires AFT (ac, pos[ωj ], si) times after decision state ωj and thenumber of tokens in ci.acωj

returns to AFT (ac, pos[ωj ], si).These initial token resettings at the end of the PSOS iterationensure the periodic behavior for the added components ineach decision state. Thus, in a decision state only the actorof choice (which is in line with the given schedule) amongstall opponent actors of the decision state can fire and thiseliminates any non-determinism because of the decision state.

The following theorems state the correctness of DSM inmodeling a single PSOS for a sub-set of the actors of theSDFG.

Theorem 1. Consider PSOS si as a schedule for actorsAi ⊆ A from SDFG G (A,C). Assume G′(A′, C ′) is theSDFG that models si in G using DSM. For any execution σ′

of G′(A′, C ′) it holds that σ satisfies si where it is assumedthat σ is the execution of G(A,C) with orderList(σ,A) =orderList(σ′, A).

Proof. Proposition 6 states that in a decision state of si, anenabled actor of the decision state which is in line with si isthe only actor able to fire in that state among all enabled actorsin Ai. So, the order of si is the only possible order of actorfiring for those actors of G′ in the set Ai. Proposition 2 impliesthat the next PSOS iteration cannot interfere. Hence, for anyexecution σ′ of G′(A′, C ′), orderList(σ′, Ai) has the form of(si)

κ where κ ∈ N (i.e., infinite repetition of si). It is assumedthat orderList(σ,A) = orderList(σ′, A); as Ai ⊆ A, we canconclude that orderList(σ,Ai) = orderList(σ′, Ai). Hence,orderList(σ,Ai) also has the form of (si)

κ and this formsatisfies si; in other words, σ satisfies si.

Theorem 2. Consider PSOS si as a schedule for actorsAi ⊆ A from SDFG G(A,C). Assume G′(A′, C ′) is theSDFG that models si in G using DSM. For any execution σof G that satisfies si it holds that there is exactly one σ′ thatis an execution of G′(A′, C ′) such that orderList(σ,A) =orderList(σ′, A).

Proof. It is assumed that the firing order of actors belonging tothe set A in execution σ′ has the same actor firing order as inexecution σ and execution σ satisfies si. We should show thatthere is precisely one execution with the property of executionσ′ and that is a valid execution for G′. In a precise way, theactor firing order related to the set A in execution σ is apossible actor firing order for the original actors (i.e., actorsnot added by DSM) of G′ when σ′ is an execution of G′.Actor ax from orderList(σ,A) belongs either to Ai or toAo = A \ Ai (x ∈ N). The state transition ωx

ax−→ ωx+1 inexecution σ is related to the firing of ax. The state transitionω′y

ay−→ ω′y+1 in execution σ′ is related to the firing of ay . Thedifference between states from execution σ and σ′ is only inthe extra channels added by DSM (i.e., Csi ). Any channel fromCsi is connected to an actor from Ai; in other words, it is notconnected to any actor from Ao. Incoming channels of an actor

Page 20: purl.tue.nlpurl.tue.nl/564398305728663.pdf · Schedule-Extended Synchronous Dataflow Graphs Morteza Damavandpeyma, Sander Stuijk, Twan Basten, Marc Geilen and Henk Corporaal This

TECHNICAL REPORT, ESR-2013-01, EINDHOVEN UNIVERSITY OF TECHNOLOGY, MAY 2013 18

determine whether that actor can fire or not. Consider ω′y fromexecution σ′ has the same content of state ωx from executionσ for all channels in C. Hence, each ax ∈ Ao that fires inωx of execution σ can also fire in ω′y of execution σ′ (i.e.,ay is ax) because the content of the state related to channelsCsi has no influence on actor enabling for any actor from Ao.But, when an actor ax belongs to Ai, the content of the staterelated to channels C ′ could manipulate the actor firing order.It is assumed that firings of actor ax in execution σ satisfy thePSOS si; actor ax can also fire in the corresponding state (i.e.,ω′y) of execution σ′ (i.e., ay is ax) because components addedby DSM force the firing of the actor which is in line with thegiven si (i.e., ax) among all actors in Ai (see Proposition 6).So, the firing order of actors from A′ \ Asi in execution σ′

follows the same firing order as it is indicated in execution σ.Each actor a ∈ Asi also has a single possible firing order ineach PSOS iteration; if a is added to control the actor firingin a decision state, it fires before the actor of choice in thedecision state (see Proposition 6) and if it is added for the inter-iteration prevention, it fires at the end of the PSOS iteration(see Proposition 2). Hence, there exists only one possible firingorder in execution σ′ for each actor a ∈ Asi . So, all actorsfrom A′ have exactly one firing order in execution σ′ whereorderList(σ,A) = orderList(σ′, A).

The size of the schedule-extended graph (e.g., G′) is de-pendent on the number of decision states found in the givenschedule (e.g., si). In this section, decision state identificationfor a sub-set of actors of G (i.e., Ai ⊆ A) which belongto the schedule of interest (i.e., si) is explained regardlessof existing other schedules for the rest of the actors in theSDFG (i.e., actors in Ao = A \ Ai). In our implementation,we consider other possible schedules designated for the restof the actors (i.e., actors in Ao) to reduce the number of thedecision states and as a result the size of the schedule-extendedgraph. As we explained in Section V-E2, actors which do notbelong to si should fire according to their schedule (if there isany) to perform their maximal execution. Firing of any actorao ∈ Ao, which belongs to another PSOS sj (j 6= i), infunction maxExec of the DSM algorithm should satisfy theschedule to which ao belongs. In other words, a0 in functionmaxExec of DSM fires if (1) ao is enabled and (2) its firingsatisfies sj . Without the second condition, firing of ao couldenable an actor from si and lead to unnecessary decision states.So, considering other schedules in the function maxExec ofDSM algorithm removes such redundant decision states.

APPENDIX BCORRECTNESS OF SASM

This section discusses the correctness of SASM in modelinga periodic static-order SAS for a sub-set of the actors ofthe SDFG. We can model multiple SASs by applying thealgorithm multiple times.

SASM adds some actors, denoted by Asi , and channels,denoted by Csi to model the SAS si in SDFG G(A,C).G′(A′, C ′) is the SDFG that models si in G using SASMwhere A′ = A ∪Asi and C ′ = C ∪ Csi .

Between any (nested) LSs αu and αv , where v = (umod n) + 1, in a LS SAS si =(α1)β1(α2)β2 ...(αn)βn,SASM adds one counter actor acntu and two channels ccntuand climu

. The following propositions shows how these com-ponents control the actor firings in αu and αv .

Proposition 7. For any (nested) LSs αu and αv , where v =(u mod n) + 1, in LS si =(α1)β1(α2)β2 ...(αn)βn, SASMensures that αv can only be executed for βv times after αu isexecuted for βu times.

Proof. SASM creates a dependency from the right-most actor(RMA) in αu to the left-most actor (LMA) in αv by addingone counter actor acntu and two channels ccntu and climu

.The channel ccntu is placed from the RMA in αu to acntu ;the production rate of the RMA in αu on channel ccntu isset to one and consumption rate of acntu on channel ccntu isset to the number of times that the RMA in αu is needed tofire in βu executions of αu. Hence, actor acntu can fire onceafter the looped schedule αu is executed for βu times (the 1st

behavior). The channel climu is placed from acntu to the LMAin αv; the production rate of acntu on channel ccntu is set tothe number of times that the LMA in αv is needed to be firedin βv executions of αv and the consumption rate of the LMA inαv on channel ccntu is set to one. Hence, the looped scheduleαv can be executed for βv times after one firing of acntu (the2nd behavior). It is assumed that SASM is recursively appliedto each (nested) looped schedules in si. Considering both ofthe explained 1st and 2nd behaviors provides that αv can onlybe executed for βv times after αu is executed for βu times.

SASM only places a specific number of initial tokens (i.e.,the number of times that the LMA of α1 should fire in α1

β1 )on the last limiter channel (i.e., climn

) to prevent deadlock inexecution; the following proposition explains how this preventsinter-iteration execution in the schedule-extended graph.

Proposition 8. Inter-iteration execution is impossible for anyactor appearing in SAS si in the SDFG G′(A′, C ′).

Proof. Considering the behavior described in Proposition 7 forall pairs of consecutive (nested) looped schedules in si ensuresthat the next schedule iteration cannot be started before thecurrent schedule iteration ends. The initial token placed onthe last limiter channel (i.e., climn ) only provides tokens forβ1 executions of the looped schedule α1; the executions of α1

related to the current schedule iteration consumes all tokenson the last limiter channel and this prevents the next iterationfrom being executed until completion of the current iteration(i.e., αn is executed for βn times).

The following theorems state the correctness of SASM.These theorems are similar to the theorems given inAppendix VII.

Theorem 3. Consider SAS si =(α1)β1(α2)β2 ...(αn)βn asa schedule for actors Ai ⊆ A from SDFG G (A,C). AssumeG′(A′, C ′) is the SDFG that models si in G using SASM.For any execution σ′ of G′(A′, C ′) it holds that σ satisfies siwhere it is assumed that σ is the execution of G(A,C) withorderList(σ,A) = orderList(σ′, A).

Page 21: purl.tue.nlpurl.tue.nl/564398305728663.pdf · Schedule-Extended Synchronous Dataflow Graphs Morteza Damavandpeyma, Sander Stuijk, Twan Basten, Marc Geilen and Henk Corporaal This

TECHNICAL REPORT, ESR-2013-01, EINDHOVEN UNIVERSITY OF TECHNOLOGY, MAY 2013 19

Proof. Considering Proposition 7 for all pairs of consecutive(nested) LSs in si guarantees that the order specified in si isthe only possible order of actor firing for those actors of theSDFG G′ in the set Ai. Proposition 8 implies that subsequentSAS iterations cannot interfere. Hence, for any execution σ′

of SDFG G′(A′, C ′), orderList(σ′, Ai) has the form of (si)κ

where κ ∈ N (i.e., infinite repetition of si). It is assumed thatorderList(σ,A) = orderList(σ′, A); as Ai ⊆ A, we canconclude that orderList(σ,Ai) = orderList(σ′, Ai). Hence,orderList(σ,Ai) also has the form of (si)

κ and this formsatisfies si; in other words, σ satisfies si.

Theorem 4. Consider SAS si as a schedule for actors Ai ⊆ Afrom SDFG G(A,C). Assume G′(A′, C ′) is the SDFG thatmodels si in G using SASM. For any execution σ of G(A,C)that satisfies si it holds that there is exactly one σ′ thatis an execution of G′(A′, C ′) such that orderList(σ,A) =orderList(σ′, A).

Proof. It is assumed that the firing order of actors belongingto the set A in execution σ′ is the same as in execution σand σ satisfies si. We should show that there is preciselyone execution with the property of execution σ′ and that isa valid execution for G′. The actor firing order related to theset A in execution σ (i.e., orderList(σ,A)) is a possible actorfiring order for the original actors (i.e., actors not added bySASM) of G′ when σ′ is an execution of G′. Actor ax fromorderList(σ,A) belongs either to Ai or to Ao = A \ Ai(x ∈ N). The state transition ωx

ax−→ ωx+1 in execution σ isrelated to the firing of ax. The state transition ω′y

ay−→ ω′y+1

in execution σ′ is related to the firing of ay . The differencebetween states from execution σ and σ′ is only in the channelsadded by SASM (i.e., Csi ). Any channel from Csi is connectedto an actor from Ai; in other words, it is not connected toany actor from Ao. Incoming channels of an actor determinewhether that actor can fire or not. Consider state ω′y fromexecution σ′ has the same content of state ωx from executionσ for all channels in C. Hence, each ax ∈ Ao that fires inωx of execution σ can also fire in ω′y of execution σ′ (i.e.,ay is ax) because the content of the state related to channelsCsi has no influence on actor enabling for any actor fromAo. But, when an actor ax belongs to Ai, the content of thestate related to channels C ′ could manipulate the actor firingorder. As we assume firings of ax in execution σ satisfy si,ax can fire in the corresponding state (i.e., ω′y) of executionσ′ (i.e., ay is ax) because components added by SASM forcethe firing of the actor which is in line with the given SASsi among all actors in Ai (considering Proposition 7 for allpairs of consecutive nested LSs) and it is assumed that ax isin line with si. So, the firing order of actors from A′ \ Asiin execution σ′ follows the same firing order as indicated inσ. Each counter actor acntx ∈ Asi also has a single possiblefiring order in each schedule iteration; it fires once after the LSαx executed for βx times (see proof of Proposition 7). Hence,there exists only one possible firing order in σ′ for each actora ∈ Asi . So, all actors from A′ have exactly one firing orderin σ′ where orderList(σ,A) = orderList(σ′, A).