Final Presentation Annual project (Part A) Winter semesterתשע"ב (2011/12) Students: Dan Hofshi,...
-
Upload
eustace-richard -
Category
Documents
-
view
215 -
download
0
Transcript of Final Presentation Annual project (Part A) Winter semesterתשע"ב (2011/12) Students: Dan Hofshi,...
Final PresentationAnnual project (Part A)
Winter semester )2011/12תשע"ב (
Students: Dan Hofshi, Shai Shachrur Supervisor: Mony Orbach
INS/GPS navigation system using RPF
Implemented with Bluspec HDL.
Using Xilinx Virtex5 FPGA
Intro
1. Abstract 2. Algorithm Reminder 3. Previous projects background.4. Solution approaches5. Detailed information on the final
implementation. 6. summary
This project is a part of a continues effort to implement a RPF based navigation system in the laboratory of high speed digital systems at the Technion university. The project and the algorithm initially written by Professor Yaakov Oshman and Mark Koifman from the faculty of Aerospace.
Previous to our project, another group of students tested and simulated the algorithm in a C++ environment and verified the algorithm functionality [1]
Later on, a group of several students designed the algorithm blocks to work on several Altera FPGA simultaneously , as the hardware resources requirements was too much to meet a single FPGA capability.
Abstract
1. ^ ("Gps computer program", by Neta Galil and Moti Perets, Winter 2010) .
Reminder – The Algorithm Principle of operation
A visual demonstration of the particle filter navigation , excluding data correction process.
Measurement update
Previous projects information & conclusions
Retrieving information on Timing and location complexity for each of the algorithm blocks and parameters. (Data busses widths, Number of particles , mathematical implementation of certain blocks).
A particle filter implementation on a single FPGA require a fundamental thinking about the way you parallelize the algorithm or reducing mathematical complexity.
A particle filter project is too big to be designed by a single\group without a proper structural design in advance.
Location complexity requires an external memory use.
First Approach for solution
Trying to reduce mathematical complexity - Failed Algorithm Phase Quaternion Euler
Trigonometric calculation
Multipliers Trigonometric calculation
Multipliers
Initialization 8 12 2 0
Propagation 0 24 6 13
Measurement update
This phase is identical for both
State vector Revaluation
17 3
N-effective calculation
This phase is identical for both
Covariance matrix calculation
3 8 0 0
Re-sampling This phase is identical for both
Regularization 8 12 0 0
Re-Weight This phase is identical for both
First approach for solution
Indeed looks very convincing as 53*N multipliers & 11*N trigonometric calculations can be reduced only by using Euler angles through all the algorithm run. But with a close look at the algorithm calculations, you can notice many cases of singularity that can't be solved by Euler angles without leading the algorithm to diverge.
Thus we choose to continue the project with the current verified algorithm using Quaternion calculations.
Second approach for solutionFrom sequential to Parallel
implementation. Initialization: Creates a new Set of N particles
Propagation: using the INS data to propagate the particles in time
Measurement Update: Using the GPS data to give weight to each particle
Normalization: Normalize all particles weight to a total sum = 1
Covariance matrix calculation
Re-Sampling
Regularization
Effective number of particles check
Re-weight
Good
Bad
Routine operation
Data correction
State vector revaluation
To User
Second approach for solutionWith a proper parallelization of the
algorithm the sequential blocks number can be reduced from 9 to 5 with a real feasibility to be implemented on the desired single FPGA.
Tools and hardware
Starting from a point of view that Xilinx Virtex5 FPGA is our board for this tasks, we’ve defined the rest of the working tools.
Bluespec HDL . Bluespec GUI (Compiler, Simulator) DDR2 SDRAM external memory. XUPV5-110T development enviornment.
Project goals
Learning Bluespec and pointing the language advantages/drawbacks.
Design, Built & simulate the top level design of the complete Algorithm infrastructure allowing future design of each of the algorithm blocks by individual groups.
Well describing the future tasks to accomplish the project.
Why Bluespec
Bluespec language syntax corresponds to fit today's large scale digital system design methodologies, with a special respect to parallel design.
It is interesting new design methodology.
Introduction to Bluespec
Bluespec system verilog or in short, Bluespec, is a relatively new high level HDL language.Bluespec language is designed to provide a way to express high level hardware constructs in an easy and highly parameterized way. The language syntax enables you to concentrate on high level details of the design and to bring closer the way you think to the way you write. "methods" define an abstract, user defined, interface which can be translated into Verilog outputs and inputs, "rules" which define a group of abstract operations which can be translated into combinational logic.
Introduction to Bluespec
Atomicity Bluespec rules is considered as an atomic
operation: meaning that once you fired a rule, the operation of the rule cannot be interrupted till the rule have finished its logic.
Introduction to Bluespec
Methods
Rules
Rules
Methods
FIFO's, memory components, other submodules
Methods
Rules
Registers ,
Methods
FIFO's, memory components, other submodules
Registers ,
The Parallelized algorithm
Stage 5 – normalization using the same module as stage 2
Stage 1
Initialization. Sequentially randomizing N particle
according to GPS and INS data.
Only Write to main memory.
Normalization
Particle Memory
Stage 2
Propagation:Measurement update:State vector revaluation:The above 3 modules includes a sequential calculation
required a single particle at a time, Thus we can allow all the 3 modules to work in parallel.
Each particle first being propagated and than cascaded to the next two modules in parallel.
Measurement update rules works only when a GPS data is valid.
This stage already reads all N particles from the memory. Thus in order to save memory calls the measurement update module prepares the total weight for next stage.
1 ( , )i ik kX f X Ins
1 ( , )i ik kX f X Gps
( : 1 )i ik kS f X i N
MemoryPropagation
State Vector
Revaluation
Measurement Update
Stage 3
Normalization:Resampling:Covariance matrix:Covariance matrix square root:
The covariance matrix calculation process is too big to stand the time constrains when taking in sequence to normalization.
In any Case, Normalization module cascades the data to prepare the covariance matrix square root in parallel with re-sampling modules.
At the end of normalization, if the data correction process is irrelevant, the process stops and the data is flushed.
( , )i ik k totX f X W
1 ( : 0 )i mk kX f X m i
( )ik ki
S f X( )k kD f S
MemoryNormalization Resampling
Covariance calculation
Memory 1 Memory 2Matrix
Memory
Regularization:Reweight:Where is a randomized vector Regularization uses the pre-prepared data of the
covariance matrix square root and the resampling data and cascades the results to Rewight.
The same as in stage 2, in order to save memory call cycles, Reweight prepare the total weight for stage 5, Normalization.
Stage 4
1 ( , , )i ik k kX f X D R
PPPPPPPPPPPPPP
RPPPPPPPPPPPPPP 1 ( , )i i
k kX f X Gps
Memory
Normalization Resampling Regularization
Memory 2
MatrixMemory
Stage 5
NormalizationThe same module as in stage 2 is
operating.
Quaternion to Euler and back Those operation is a separate module
that can be cascaded on the way where needed.
A word about timing
Roughly choosing a 150 MHz clock.
Total Time = 6.66 [ns] x 36 x 30,000 = 7.2 ms
Number of clocks
Throughput Unit
30,000 1 Propagation
1 Measurement update
17x30,000 1 Normalization
Follows Normalization
Re-sampling
1/17 Covariance calculation
17x30,000 1/17 Regularization
1 Re-weight
30,000 1 Normalization
Particles memories
Bluespec enables the user to encapsulate a Verilog code with a Bluespec methods.
The Particle memory controller was designed with 3 different spaces.
1. Main memory – for normal quaternion particles used in the routine operation of the algorithm.
2. Second and third memory – design to keep particles in their Euler angles form for the data correction process
Particles memories
Particle memory is N sequential & address independent. A Start signal is Asserted in the beginning of each stage. (inner design of the controller should control the addresses)
Read commands are given in advance by the memory controller to avoid data acquisition delay. The data is stored in a local FIFO.
The main memory controller is available at the top level design.
Note: a DDR2 optional burst write\read mode consists of 4*128 bit data that should fit the above tasks.
Covariance matrix memory
The covariance matrix calculation is a set of 17^2 multiplications per particle, creating an additive value of the complete matrix.
In order to stand in the time constrains, a single row calculation of the matrix with a data bus of 17*56 Bit need to be opened to the memory.
Concerning the above, “add” method (instead of write) is added to the covariance matrix memory. (adding entire row to the matrix SUM)
read command is done element by element. The Covariance matrices memories is available only within
the covariance matrix top module. The Covariance matrix memory is the virtex5 internal block
RAMs
Covariance matrix square root memory
The same as the Covariance matrix memory but,
Write method is done element by element [row,col]
Read method is done per Matrix row. (17*56 bit).
Modules design
The user receives an empty module, with predesigned interfaces (Methods) and inner Fifos & registers containing all the relevant data needed for a single particle calculation of its relevant algorithm phase.
In some cases, when a time constrain forces a certain register size or certain data flow, the data flow arrives at the correct size & flow sequence.
Modules design
In Fifo
Out Fifo
Data registers
Operational Rule or inner modules
Single module block – for future individual design
Future tasks
Project B Understanding and creating a Bluespec wrapper for
encapsulating a DDR2 memory controller for Xilinx Virtex5 FPGA.
Writing the Bluespec memory controller for the sequential Particles memory.
Simulating the controller. Future generations: Writing all of the algorithm inner modules according
to the final report descriptions of necessary constrains.
The modules can be written in Verilog and encapsulated to Bluespec.
Summary
Top down design processes are easier to implement in Bluespec as no time scheduling is required.
Bluespec HDL encapsulation capabilities allows fast parallelism, simulations and test benches of large systems even if already written in Verilog.
the current BSV structure is operating properly, an inner design of each module can be done and simulated with the same code.
The Number of particles in the algorithm is open for changes without harming the algorithm operation.
The END