· PDF file...

Click here to load reader

  • date post

    21-Aug-2020
  • Category

    Documents

  • view

    1
  • download

    0

Embed Size (px)

Transcript of · PDF file...

  • iii

    1 INTRODUCTION............................................................................................1

    1.1 The Synchronous Dataflow model....................................................7

    1.1.1 Background........................................................................7

    1.1.2 Utility of dataflow for DSP .............................................11

    1.2 Parallel scheduling ..........................................................................13

    1.2.1 Fully-static schedules ......................................................15

    1.2.2 Self-timed schedules........................................................19

    1.2.3 Execution time estimates and static schedules ................21

    1.3 Application-specific parallel architectures......................................24

    1.3.1 Dataflow DSP architectures ............................................24

    1.3.2 Systolic and wavefront arrays .........................................25

    1.3.3 Multiprocessor DSP architectures ...................................26

    1.4 Thesis overview: our approach and contributions ..........................27

    2 TERMINOLOGY AND NOTATIONS ........................................................33

    2.1 HSDF graphs and associated graph theoretic notation ...................33

    2.2 Schedule notation............................................................................35

    3 THE ORDERED TRANSACTION STRATEGY.......................................39

    3.1 The Ordered Transactions strategy .................................................39

    3.2 Shared bus architecture ...................................................................42

    3.2.1 Using the OT approach....................................................46

    3.3 Design of an Ordered Memory Access multiprocessor ..................47

    3.3.1 High level design description ..........................................48

    3.3.2 A modified design ...........................................................49

    3.4 Design details of a prototype ..........................................................52

    3.4.1 Top level design ..............................................................53

    3.4.2 Transaction order controller ............................................55

    3.4.2.1. Processor bus arbitration signals......................55 3.4.2.2. A simple implementation .................................57

    Table of Contents

  • iv

    3.4.2.3. Presettable counter ...........................................58

    3.4.3 Host interface...................................................................60

    3.4.4 Processing element ..........................................................61

    3.4.5 Xilinx circuitry ................................................................62

    3.4.5.1. I/O interface .....................................................64

    3.4.6 Shared memory................................................................65

    3.4.7 Connecting multiple boards.............................................65

    3.5 Hardware and software implementation .........................................66

    3.5.1 Board design....................................................................66

    3.5.2 Software interface............................................................69

    3.6 Ordered I/O and parameter control .................................................71

    3.7 Application examples......................................................................73

    3.7.1 Music synthesis ...............................................................73

    3.7.2 QMF filter bank...............................................................75

    3.7.3 1024 point complex FFT .................................................76

    3.8 Summary .........................................................................................78

    4 AN ANALYSIS OF THE OT STRATEGY .................................................79

    4.1 Inter-processor Communication graph (Gipc) .................................82

    4.2 Execution time estimates ................................................................88

    4.3 Ordering constraints viewed as edges added toGipc .............................89

    4.4 Periodicity .......................................................................................90

    4.5 Optimal order ..................................................................................92

    4.6 Effects of changes in execution times.............................................96

    4.6.1 Deterministic case ...........................................................97

    4.6.2 Modeling run time variations in execution times ............99

    4.6.3 Implications for the OT schedule ..................................104

    4.7 Summary .......................................................................................106

    5 MINIMIZING SYNCHRONIZATION COSTS IN SELF-TIMED SCHEDULES ...............................................................................................107

  • v

    5.1 Related work .................................................................................108

    5.2 Analysis of self-timed execution...................................................112

    5.2.1 Estimated throughput.....................................................114

    5.3 Strongly connected components and buffer size bounds ..............114

    5.4 Synchronization model .................................................................116

    5.4.1 Synchronization protocols .............................................116

    5.4.2 The synchronization graph Gs ..................................................118

    5.5 Formal problem statement ............................................................122

    5.6 Removing redundant synchronizations.........................................124

    5.6.1 The independence of redundant synchronizations ........125

    5.6.2 Removing redundant synchronizations .........................126

    5.6.3 Comparison with Shaffer’s approach ............................128

    5.6.4 An example....................................................................129

    5.7 Making the synchronization graph strongly connected ................131

    5.7.1 Adding edges to the synchronization graph ..................133

    5.7.2 Insertion of delays .........................................................137

    5.8 Computing buffer bounds from Gs and Gipc...........................................141

    5.9 Resynchronization.........................................................................142

    5.10 Summary .......................................................................................144

    6 EXTENSIONS..............................................................................................147

    6.1 The Boolean Dataflow model .......................................................147

    6.1.1 Scheduling .....................................................................148

    6.2 Parallel implementation on shared memory machines .................152

    6.2.1 General strategy.............................................................152

    6.2.2 Implementation on the OMA.........................................155

    6.2.3 Improved mechanism ....................................................157

    6.2.4 Generating the annotated bus access list .......................161

    6.3 Data-dependent iteration ...............................................................164

    6.4 Summary .......................................................................................165

  • vi

    7 CONCLUSIONS AND FUTURE DIRECTIONS.....................................166

    8 REFERENCES.............................................................................................170

  • vii

    Figure 1.1. Fully static schedule ........................................................................ 16

    Figure 1.2. Fully-static schedule on five processors.......................................... 17

    Figure 1.3. Steps in a self-timed scheduling strategy ........................................ 20

    Figure 3.1. One possible transaction order derived from the fully-static schedule

    .........................................................................................................41

    Figure 3.2. Block diagram of the OMA prototype ............................................ 49

    Figure 3.3. Modified design............................................................................... 50

    Figure 3.4. Details of the “TA” line mechanism (only one processor is shown) .

    .........................................................................................................51

    Figure 3.5. Top-level schematic of the OMA prototype.................................... 54

    Figure 3.6. Using processor bus arbitration signals for controlling bus access. 56

    Figure 3.7. Ordered Transaction Controller implementation ............................ 58

    Figure 3.8. Presettable counter implementation ................................................ 59

    Figure 3.9. Host interface .................................................................................. 61

    Figure 3.10. Processing element........................................