Bhattacharyya
-
Upload
guest3bd2a12 -
Category
Technology
-
view
623 -
download
0
Transcript of Bhattacharyya
Design and Synthesis of Image Processing Systems using Reconfigurable Dataflow Graphs
Mainak Sen and Shuvra S. Bhattacharyya
Department of Electrical and Computer Engineering, andInstitute for Advanced Computer Studies
University of Maryland at College Park
Maryland DSPCAD Research Grouphttp://www.ece.umd.edu/DSPCAD/home/dspcad.htm
November 22, 2005Leiden University, The Netherlands
Design and Synthesis of Image Processing Systems, 2University of Maryland at College Park
Outline
Dataflow-based model of computation for modeling the behavior of DSP applications
Decidable dataflow models Example: use of decidable dataflow as a model of computation for
modeling the mapping of (decidable) dataflow behaviors onto embedded multiprocessors
Structured reconfiguration of dataflow graphs Examples of meta-modeling techniques that can be classified as
structured, reconfigurable dataflow Parameterized dataflow and its application to SDF Homogeneous-parameterized dataflow and its application to SDF and
CSDF Experiments on a gesture recognition application
Summary
Design and Synthesis of Image Processing Systems, 3University of Maryland at College Park
Dataflow-based design for DSP(Example from Agilent ADS tool)
Design and Synthesis of Image Processing Systems, 4University of Maryland at College Park
DSP-oriented Dataflow Models of Computation
Used widely in design tools for DSP Application is modeled as a directed graph
Nodes (actors) represent functions Edges represent communication channels between functions Nodes produce and consume data from edges Edges buffer data in FIFO (first-in first-out) fashion
Data-driven execution model A node can execute whenever it has sufficient data on its input
edges The order in which nodes execute is not part of the specification The order is typically determined by the compiler, the hardware,
or both Iterative execution
Body of loop to be iterated a large or infinite number of times
Design and Synthesis of Image Processing Systems, 5University of Maryland at College Park
Dataflow Features and Advantages
Exposes coarse-grain parallelism. Exposes high-level structure that facilitates analysis, verification,
and optimization. Captures multi-rate behavior. Complementary to ongoing advances in DSP compiler technology
for procedural languages, such as C and MATLAB. Encourages desirable software engineering practices: modularity
and code reuse Amenable also to aspect-oriented design.
Intuitive to DSP algorithm designers: signal flow graphs.
Design and Synthesis of Image Processing Systems, 6University of Maryland at College Park
Evolution of Dataflow Models for DSP Synchronous dataflow: static multirate behavior
Agilent ADS, Cadence SPW, etc. Well-behaved dataflow: schemas for bounded dynamics Boolean/integer dataflow: Turing complete models Multidimensional synchronous dataflow: image and video Scalable synchronous dataflow: block processing
Synopsys COSSAP Cyclo-static dataflow: phased behavior
Synopsys El Greco, Eonic Systems Virtuoso Synchro, System Canvas
Bounded dynamic dataflow : bounded dynamics The processing graph method: reconfigurable dynamic DF
US Naval Research Laboratory, MCCI Autocoding Toolset Parameterized dataflow: dynamically-reconfigurable static DF Blocked dataflow: image and video in terms of reconfigurable dataflow
Design and Synthesis of Image Processing Systems, 7University of Maryland at College Park
Modeling Design Space
Ex
pr e
ss
i ve
po
we
r
Verification / synthesis power
XC, BDF, DDF
XSDF
XCSDF
XCSDF, SSDFMDSDF,
WBDF
X
XPSDF
XPCSDF
(Third dimension: simplicity and intuitive appeal)
Design and Synthesis of Image Processing Systems, 8University of Maryland at College Park
Decidable Dataflow Models
Modeling flow for representing static flowgraph behavior: Cyclo-static dataflow (CSDF), multiphase modeling Synchronous dataflow (SDF), multirate modeling Homogeneous synchronous dataflow (HSDF) Acyclic homogeneous synchronous dataflow (“task graphs”)
These are in decreasing order or generality Designs represented in the more general models can be converted
to equivalent representations in the less general ones e.g., CSDF SDF HSDF task graph
HSDF: each actor (graph node) produces/consumes exactly one data value to/from each incident output/input edge Suitable for exposing parallelism Not the best model for minimizing memory requirements
Design and Synthesis of Image Processing Systems, 9University of Maryland at College Park
Synthesis Techniques for Decidable Models
Static scheduling: low overhead, predictability Performance analysis through synchronization graphs Loop scheduling
Implicit repetition in the dataflow graph (through changes in sample rate) needs to be translated into explicit repetition in the form of loops on the execution target.
Complex design space exists for such translation Complementary to procedural language techniques for nested loop
compilation Loop scheduling techniques
Simulation speedup (minimization of scheduling complexity) Code/data minimization Hierarchical parallel scheduling Block processing
Task scheduling for latency/throughput optimization Probabilistic design: exploiting tolerances to deadline misses
Design and Synthesis of Image Processing Systems, 10University of Maryland at College Park
Example: Intermediate representations for synthesis from decidable dataflow models
Consider a decidable dataflow behavior that is to be implemented on a self-timed, embedded multiprocessor Natural way to implement DSP multiprocessors from decidable dataflow Actor assignment and ordering are performed statically Invocation (dispatch) of actors is performed dynamically, through
synchronization Candidate mappings of the behavior onto the architecture can be
represented through an intermediate representation that also has decidable dataflow semantics This representation is useful for understanding the performance,
communication overhead, and synchronization structure associated with the candidate mapping
Facilitates the separation of communication and synchronization functionality
This is a useful modeling methodology for design space exploration
Design and Synthesis of Image Processing Systems, 11University of Maryland at College Park
Interprocessor Communication Graph (Gipc)
2r1
4s1
4s2
4s3
5s1
7r1
8r1
9r1
6
2
3
4
5
8
7
9
1
IPC GraphEvery edge (vi, vj) induces the
precedence constraint
2
4
1
3
6
5
8
7
9
Self-Timed Schedule
Proc 1: (1, 2, 3, 4, 6)
Proc 2: (5, 7, 8)
Proc 3: (9)
Proc 1 Proc 2 Proc 3
Self-timed schedule and its IPC graph
Design and Synthesis of Image Processing Systems, 12University of Maryland at College Park
The synchronization graph Gs
Derived from the interprocessor communication graph
Synchronization edges are distinguished from interprocessor communication (IPC) edges
Synchronization edges represent precedence constraints that are enforced by synchronization protocols
IPC edges represent data transfers
Interprocessor connections
Coincident synchronization and IPC edges communication together with synchronization protocol (conventional approach)
IPC edge only communication without synch. protocol
Synchronization edge only synchronization protocol only
Design and Synthesis of Image Processing Systems, 13University of Maryland at College Park
Applications of Synchronization Graphs
Simulation Throughput estimation through cycle mean analysis Removal of redundant synchronizations Resynchronization Conversion to more efficient synchronization protocols
(strongly connected synchronization graphs) Statically determining and minimizing the sizes of
interprocessor communication buffers• All are post-processing methods that can be applied to improve a wide range of existing task graph scheduling techniques on a wide range of multiprocessor architectures.
• These techniques benefit from good execution time estimates, but do not depend on exact execution time values to deliver useful results.
Design and Synthesis of Image Processing Systems, 14University of Maryland at College Park
Beyond Decidable Models
Limited expressive power: DSP applications increasingly employ high-level dynamics in their behavior User interface functionality Mode changes Adaptive algorithms Reconfiguration of processing resources/parameters
However, key subsystems still exhibit large amounts of “quasi-static” structure --- structure that stays fixed across significant windows of time.
Various dynamic dataflow models have been proposed that address the limitation above by abandoning most or all restrictions related to decidable dataflow
However, these methods are correspondingly limited in their ability to exploit the quasi-static structure described above
Design and Synthesis of Image Processing Systems, 15University of Maryland at College Park
Parameterized Dataflow: Structured Control of Dynamic Parameters
• The Key discipline that is imposed on reconfiguration is that each subsystem must have a consistent view of each of its actors (hierarchical or primitive) throughout any given iteration of that subsystem.
Design and Synthesis of Image Processing Systems, 16University of Maryland at College Park
Parameterized Dataflow
Hierarchical modeling
subsystem
parent graph
subinit init
body
parameter n, ...
writes n
reads n
Parameterized DF subsystem is composed of 3 parmeterized DF graphs: init, subinit, body
Subsystem parameters configured in init/subinit,
used in body
Dynamically reconfigurable
Design and Synthesis of Image Processing Systems, 17University of Maryland at College Park
Meta-modeling with parameterized dataflow
Parameterized dataflow can be applied to any dataflow model of computation (“base model”) to augment that model with dynamic reconfiguration capabilities in a structured way Provides for efficient quasi-static scheduling Enables execution to be viewed in terms of a sequence of
dataflow graphs in the base model Parameterized dataflow + XYZ “Parameterized XYZ” Examples of parameterized dataflow models of
computation that we are developing and experimenting with parameterized synchronous dataflow (PSDF) parameterized cyclo-static dataflow (PCSDF)
Design and Synthesis of Image Processing Systems, 18University of Maryland at College Park
Parameterized Synchronous Dataflow (PSDF)
“Locally synchrony” conditions can be formulated and checked in a quasi-static fashion to ensure that bounded token production and consumption along with bounded delays lead to bounded memory requirements overall. This is not true of unstructured dynamic dataflow models, such
as general dynamic dataflow, boolean dataflow, and bounded dynamic dataflow
Techniques for construction of streamlined looped schedules for synchronous dataflow graphs have natural and efficient extensions to the construction of parameterized looped schedules for PSDF graphs.
Design and Synthesis of Image Processing Systems, 19University of Maryland at College Park
PSDF Example: CD to DAT ConversioninitChild
setFac(sets i1,…d4)
CD
PF1
1 1 d1 i4
i1 i3 d2 d4
i2 d3PF2
preamble
PF3
PF4
DAT
params i1, d1, …., i4, d4
init
body
body
repeat 5 times { fire setFac /* sets i1, d1, i2, d2, i3, d3, i4, d4 */ int _g1 = gcd(i1, d2); int _g2=gcd((i2 x i1)/_g1, d3) int _g3=gcd((i3 x i2 x i1)/(_g2 x _g1), d4); repeat (d4/_g3) times { repeat (d3/_g2) times { repeat (d2/_g1) times { repeat (d1) times {fire CD} fire PF1 } repeat (i1/_g1) times {fire PF2} } repeat ((i2 x i1)/(_g2 x _g1)) times {fire PF3} } repeat ((i3 x i2 x i1)/(_g3 x _g2 x _g1)) times { fire PF4 } repeat (i4) times {fire DAT}}
Design and Synthesis of Image Processing Systems, 20University of Maryland at College Park
PSDF Example: Speech Compression
Design and Synthesis of Image Processing Systems, 21University of Maryland at College Park
PCSDF Version of Speech Compression
Design and Synthesis of Image Processing Systems, 22University of Maryland at College Park
Outline
Dataflow-based model of computation for modeling the behavior of DSP applications
Decidable dataflow models Example: use of decidable dataflow as a model of computation for
modeling the mapping of (decidable) dataflow behaviors onto embedded multiprocessors
Structured reconfiguration of dataflow graphs Examples of meta-modeling techniques that can be classified as
structured, reconfigurable dataflow Parameterized dataflow and its application to SDF Homogeneous-parameterized dataflow and its application to SDF and
CSDF Experiments on a gesture recognition application
Summary
Design and Synthesis of Image Processing Systems, 23University of Maryland at College Park
Homogeneous Parameterized Dataflow (HPDF)
• Parameterized dataflow model that can encapsulate dynamicity of application. • Meta-modeling technique. Hierarchical actors can have any other underlying dataflow model (SDF, CSDF, PSDF etc.)• Data production & consumption rates though dynamic are equal across an edge for a large number of applications - thus the name homogeneous.• Reconfiguration can be performed without introducing hierarchy when more natural to do so (advantage over parameterized dataflow).• Parameterized dataflow is a more powerful technique and thus can be used to represent a wider set of applications.
Design and Synthesis of Image Processing Systems, 24University of Maryland at College Park
Applications
• Applications with dynamic run-time data and aggregated final-stage processes perform especially well for HPDF over SDF semantics.• Many applications in image and speech processing seem well suited for our model.• We applied the model on two applications – - A real-time video processing algorithm for smart camera developed at Princeton - A face detection algorithm developed at CFAR labs in UMD.
Design and Synthesis of Image Processing Systems, 25University of Maryland at College Park
Application characteristics
A B M N
Dynamic but
balanced amount of
data Aggregatingfinal-stage
• This structure seems to be abundant in many audio/video applications.• Our HPDF model is a natural fit for applications with the above structure.
Design and Synthesis of Image Processing Systems, 26University of Maryland at College Park
Gesture recognition algorithm
• Real-time video processing for gesture recognition. • Does low-level (red oval) and high-level processing.• Low-level processing recognizes body parts and identifies movements.• High-level processing recognized actions.• We concentrate on low-level processing.
Ref : W. Wolf, B. Ozer, T. LV. Smart cameras as embedded systems. IEEE Computer Magazine Vol 35, Iss 9, Sept 2002, Pages 48-53
Design and Synthesis of Image Processing Systems, 27University of Maryland at College Park
HPDF model of gesture recognition algorithm
Region finding
Contour following
Ellipse Fitting
GraphMatching
Dynamic data Aggregating
final-stage
Dynamic data
n n p p
Ptolemy II implementation
Design and Synthesis of Image Processing Systems, 28University of Maryland at College Park
Modeling with HPDF/CSDF
VIDEO INPUT
REGION EXTRACTION
CONTOURFOLLOWING
(s 1) (s 1)
(s 1) (s 1)
(s 1) (s 1)
(s 1) (Xi, Yi)
(s 1) (Xi, Yi)
ELLIPSE FITTING
(I 0,I ki ) (n
1)
MATCHp (pi1, qi 0)
p phases with 1 token and (n-p) phases with 0 token production
#phases = #pixels = s
p p and q n pi i
Design and Synthesis of Image Processing Systems, 29University of Maryland at College Park
Integrating HPDF and CSDF
Number of phases in a fundamental period can vary dynamically. Number of tokens produced or consumed in a given phase can also
vary dynamically. HPDF constraint: the total number of tokens produced by a source
actor of a given edge in a given invocation (a fundamental period) must equal the total number of tokens consumed by the sink in its corresponding invocation.
Design and Synthesis of Image Processing Systems, 30University of Maryland at College Park
Each frame has 384x240 pixels, so we model the input as a CSDF actor with 92160 = s phases.
Model captures pixel level parallelism present in Region. It also captures the frame level parallelism through the number of
phases in Input (s).
Finer granularity and Input modeling
VIDEO INPUT
REGION EXTRACTION
(s 1) (s 1)
(s 1) (s 1)
(s 1) (s 1)
#phases = #pixels = s
Design and Synthesis of Image Processing Systems, 31University of Maryland at College Park
Modeling dynamicity - Contour
2 phases for Contour First one scans until finds a contour.
Output = 0 tokens Second one follows this contour and all the overlapping
ones. Output = ki tokens, each token is a list of pixels from a contour
Homogeneous condition remains: =s
i
ii YX )(
Design and Synthesis of Image Processing Systems, 32University of Maryland at College Park
Scheduling
VRCEM (s V)(s R)(2I C)(n E)M (s VR)(2I C)(n E)M
Design and Synthesis of Image Processing Systems, 33University of Maryland at College Park
• We applied HPDF to successfully model a face detection algorithm also.• We developed a TI DSP implementation of the HPDF model of the gesture recognition algorithm.• The application was run on a TMS320C64xx fixed point processor.• When implemented with our HPDF model, the runtime was 21405671 cycles. • With a 40ns cycle period, execution time for the application was 0.86 sec.
Results
Design and Synthesis of Image Processing Systems, 34University of Maryland at College Park
Results (contd.)
• Scheduling overhead was minimal as imperatively highly streamlined quasi-static schedule was obtained.• Worst case buffer size 642 Kb when the input images were 384X240 pixels. HPDF modeling suggested buffer reuse between the edges.• Original C code had runtime of 27741882 cycles, execution time was 1.11 sec with the same clock period of 40 ns.• HPDF improved runtime by 23%.• Efficient hardware code generation is being looked into using hardware synthesis framework developed in our research group.
Design and Synthesis of Image Processing Systems, 35University of Maryland at College Park
Summary
Dataflow-based model of computation for is attractive for modeling the behavior of DSP applications
Decidable dataflow models are useful for exposing and exploiting static structure in synthesis tools for DSP
Decidable dataflow models in conjunction with structured reconfigurable techniques allow for efficient handling of application dynamics
Examples of structured, reconfigurable dataflow techniques that we discussed: Parameterized dataflow and its application to SDF Homogeneous-parameterized dataflow and its application to SDF and
CSDF Experiments on a gesture recognition application
Other examples include dynamic configuration of graph topologies, and blocked dataflow modeling.
Design and Synthesis of Image Processing Systems, 36University of Maryland at College Park
References
B. Bhattacharya and S. S. Bhattacharyya. Parameterized dataflow modeling for DSP systems. IEEE Transactions on Signal Processing, 49(10):2408-2421, October 2001
S. S. Bhattacharyya, R. Leupers, and P. Marwedel. Software synthesis and code generation for DSP. IEEE Transactions on Circuits and Systems --- II: Analog and Digital Signal Processing, 47(9):849-875, September 2000.
G. Bilsen, M. Engels, R. Lauwereins, and J. A. Peperstraete. Cyclo-static dataflow. IEEE Transactions on Signal Processing, 44(2):397-408, February 1996.
D. Ko and S. S. Bhattacharyya. Dynamic configuration of dataflow graph topology for DSP system design. In Proceedings of the International Conference on Acoustics, Speech, and Signal Processing, pages V-69-V-72, Philadelphia, Pennsylvania, March 2005.
E. A. Lee and D. G. Messerschmitt. Static scheduling of synchronous dataflow programs for digital signal processing. IEEE Transactions on Computers, February 1987.
S. Neuendorffer and E. Lee. Hierarchical reconfiguration of dataflow models. In Proceedings of the International Conference on Formal Methods and Models for Codesign, June 2004.
M. Sen, S. S. Bhattacharyya, T. Lv, and W. Wolf. Modeling image processing systems with homogeneous parameterized dataflow graphs. In Proceedings of the International Conference on Acoustics, Speech, and Signal Processing, pages V-133-V-136, Philadelphia, Pennsylvania, March 2005