Sub-Nyquist Sampling DSP & SCD Modules Presented by: Omer Kiselov, Daniel Primor Supervised by: Ina...
-
Upload
charity-patrick -
Category
Documents
-
view
221 -
download
2
Transcript of Sub-Nyquist Sampling DSP & SCD Modules Presented by: Omer Kiselov, Daniel Primor Supervised by: Ina...
Sub-Nyquist SamplingDSP & SCD Modules
Presented by: Omer Kiselov, Daniel PrimorSupervised by: Ina Rivkin, Moshe Mishali
Winter 2010 High Speed Digital Systems labElectrical Engineering faculty
Technion – Israeli institute of technology
Outline
• Overview – Goals and discussion• Algorithm review• Implementation in hardware• Changes for Adaptation to hardware• Evaluation• Possible Optimization & Future Work
Overview
• The Goal system• The module’s Objectives• Interface
Memory
CTF(Support
recovery)DSP
(Baseband)
AnalogBack-end
(Realtime)
Detector
Expand1:q
DELAYFIFO
SUPPORT & MatrixDSP
(Baseband)
0
†
, 1i pY AZ Z f X f i L f
YA Z
DSP & SUPPORT CHANGE DETECTOR
A matrix vector 432 bits
Support Anlysis vector101 bits
First Beta (For QR decomposition)
36 bits
Samples Bundle 432 bits
Support Changed1 bit
Valid Supports 1 bit
A Matrix Address 9 bits
Valid samples 1 bit
Outline
• Overview – Goals and discussion• Algorithm review• Implementation in hardware• Changes for Adaptation to hardware• Evaluation• Possible Optimization & Future Work
Algorithm Review
• Pseudo-Inverse– Matrix Decomposition– Matrix Inversion– Matrix Multiplication
• Support Change Detection– Support threshold evaluation attempt
Pseudo inverseReal Time Vector MultiplierSupport Change Detector
Algorithm Review – Pseudo Inverse
• Matrix Decomposition• QR Decomposition
• Using Householder Reflections
1†
1 1
† 1
T Tn m n mn n
n n n m
T
T
A A A A
A Q R
A R Q
A R Q
1...i i kQ Q Q
Algorithm Review – Pseudo Inverse
• Matrix Inversion – Gaussian Elimination
• Matrix MultiplicationMatrix
MultiplierVector
Multiplier
Matrix Multiplier’s Common Interface
Algorithm Review - SCD• The support change detector is a vector multiplier – given
one row of the pseudo inversed A matrix and multiply it by the signal to see if any energy there is not noise.
• Threshold generation attempt:
– If there was no support change
– If we replace W with the average:
– The generated value doesn't show any false alarms. But may have misdetection on several cases where the SNR is low.
*Eventually The Threshold was defined as an input by the user.
min minamplitude noiseThreshold sample in range samples A
/20
1
1
10
min
DB
noiseSNR
signal
noisenoise
signal
A
A
AT sample A
A
1* )max) ((sample samp FrameOrgan noiseP y W P A P T
24 24 22
24 _1 1
2 2 22_ 1
24
1 11
max
24 max 24 max) ( 24 24
24 5
i
i
samples samp avg samp noise avgi i
samp noise avg
samp avgi
P y W y W
y W P Anoise threshold P T
y W P T P T
Our estimated guess for threshold is 000001000110010100 (for the AM demo)~0.3
DSP & SCD system operation
QR Decomposition
Upper triangular
matrix inverse
Matrix multiplier
R
Q’Auxiliary multiplicationsReflections creationReflection multiplication
R inversed
Delay FIFO
A Matrix RAM
Real Time Matrix-Samples Multiplier
Ping-Pong Buffer (RAM)
A dagger
Support Change Detector
Control Vector
Supportindexes A_s
SamplesFromExpand
Reconstructed Signal
'1'
Outline
• Overview – Goals and discussion• Algorithm review• Implementation in hardware• Changes for Adaptation to hardware• Evaluation• Possible Optimization & Future Work
Implementation In Hardware
QR Decompositio
n
Inverting an upper
triangular matrix
Matrix Multiplier
Block (Entities) Definition – Pseudo Inverse
QR Decompositi
on
Matrix Multiplier
Matrix Inversion
Implementation In Hardware
• Block (Entities) Definition – Pseudo Inverse• QR Decomposition
Phase 2Phase 1
Aux 2
24 Multipliers
Beta calculation unit
Matrix Inversion Unit
Implementation In Hardware
• Block (Entities) Definition – Pseudo Inverse
Vector Inversion UnitVector Inverter
FIFO for Original R Matrix
Implementation In Hardware
Matrix Multiplier
RAM
Matrix Multiplier
SCD
Real Time Mult
Outline - Adaptation to Hardware
• Overview – Goals and discussion• Algorithm review• Implementation in hardware• Adaptation to hardware
– Complex Enhance– Normalizing the Input– Resolution (Overflow) discussion– SCD – running average– Timing issues
• Evaluation• Possible Optimization & Future Work
Complex Enhance
• To avoid all complex multiplications we changed the structures of the matrix.
• The matrix is 4 times bigger. For every complex vector multiplication we can still multiply 1 vector with another vector the ordinary way, and get the correct results.
, ,
,, ,
0, 0,
( ) ( )
( ) ( )
i j i j
i ji j i j
i rownumber and j columnnumber
real a imag aa A
imag a real a
Normalizing the Input
• Accuracy falls with smaller mantissa
• Matrices can be normalized pre inverse and post inverse
• Hence:
• Motivation– The real data differed
from the synthetic data given – thus 18 bits are not enough (we need to represent both the number and 1 divided by the number).
– Normalizing the matrix allows us to play with the fraction to minimize error and underflow.
†
1 †2
1 †2
12
2
z y A
z y D A
D isdiagonal
z D y A
z D z
z D z
Support Change Detection – with running average
Vector multiplier
Cycle counter
Control vector RAM
Samples
MU
X
REG6
REG7
REG8
REG1
REG2
REG3
REG5
REG4
+Detection
>
Threshold
Timing
• Deep pipeline– We incorporated a deeper pipeline to make the module
work on the high desired frequency. The Quartus currently shows that the module may perform only up to the given frequency. It is possible to rise it by raising the pipe levels in the bottlenecks found in the design.
• Clocks– Main clock – 20 MHz may rise to 70MHz– Working clock for pseudo inverse – 100 MHz – currently
non flexible
• Hardware reuse– The matrix multiplier and the inverse unit use a single unit
for a vector size for many iterations – hence they make the bottlenecks.
Bottlenecks in the design
• Matrix Inverse• Matrix Multiplier• Beta calculation in the QR – heavy arithmetic actions taking place.
• If we replace the arithmetic units within these entities with higher pipeline units (the division is 23 cycles, the square root is 11 cycles and the multiplier is 2) – the maximal frequency will rise.
• No real reason to activate with a higher clock except when memory on the chip is lacking for the delay FIFO or speed being an actual necessity.
Resource Consumption
• Total numbers taken from Stratix III FPGA EP3SE260F1152C2
AloneWith architecture
totalusageusage with architecture
architecture consumption
out of total
5194062,913203,52025.52%30.91%5.39%17.44%combinational ALUT's
0640101,7600.00%0.63%0.63%100.00%memory ALUT's
1778848,820203,5208.74%23.99%15.25%63.56%logic registers
1002241,240,80815,040,5120.67%8.25%7.58%91.92%memory bits
75275276897.92%97.92%0.00%0.00%dsp block 18-bit elements
0580.00%62.50%62.50%100.00%PLLs
0240.00%50.00%50.00%100.00%DLLs
Resources on FPGAUsage percentageResources
DSP – Runtime Analysis
• Worse case pseudo inverse timing (for 11 support vectors) is a delay of 0.5 milliseconds. Hence an appropriate delay FIFO is required.
• The SCD and reconstruction multiplier works in real time (1 cycle 50 ns).
Outline
• Overview – Goals and discussion• Algorithm review• Implementation in hardware• Changes for Adaptation to hardware• Evaluation
– Testing method– Results– discussion– Conclusions
• Possible Optimization & Future Work
Evaluation - Testing
Input text files
Output text files
Matlab (fixed
point)=
VHDL
Logical Testing
Expanded
samples
CTF output support
VHDL – Test bench
A matrix memory
Status parser
Functional module
DSP SCD
Evaluation - Testing
Input text files
Output text files
Analysis &
Comparison to
Modelsim
On Chip Testing
Expanded
samples
CTF output support
Debug Environment
A matrix RAM
CTF model & FIFO ctrl
Functional module
DSP SCD
Evaluation - Results
• Results of the run on FPGA with the following signals– Fm259_252_sin824_809– Fm259_252_am872.697– Am_872.697_sin824
• SCD test
Evaluation - Results
0 10 20 30 40 50 60 70 80 90-200
-190
-180
-170
-160
-150
-140
-130
-120
-110
-100
Frequency )MHz(
Pow
er/f
requ
ency
)dB
/Hz(
Reconstructed sequence #1
0 10 20 30 40 50 60 70 80 90-180
-160
-140
-120
-100
-80
-60
Frequency )MHz(
Pow
er/f
requ
ency
)dB
/Hz(
Reconstructed sequence #2
0 10 20 30 40 50 60 70 80 90-200
-180
-160
-140
-120
-100
-80
Frequency )MHz(
Pow
er/f
requ
ency
)dB
/Hz(
Reconstructed sequence fixed point modelsim #1
0 10 20 30 40 50 60 70 80 90-180
-160
-140
-120
-100
-80
-60
Frequency )MHz(
Pow
er/f
requ
ency
)dB
/Hz(
Reconstructed sequence fixed point modelsim #2FPGA output
0 10 20 30 40 50 60 70 80 90-200
-190
-180
-170
-160
-150
-140
-130
-120
-110
-100
Frequency )MHz(
Pow
er/
frequency )
dB
/Hz(
Reconstructed sequence #1
0 10 20 30 40 50 60 70 80 90-180
-160
-140
-120
-100
-80
-60
Frequency )MHz(
Pow
er/
frequency )
dB
/Hz(
Reconstructed sequence #2
0 10 20 30 40 50 60 70 80 90-200
-180
-160
-140
-120
-100
-80
Frequency )MHz(
Pow
er/
frequency )
dB
/Hz(
Reconstructed sequence fixed point modelsim #1
0 10 20 30 40 50 60 70 80 90-180
-160
-140
-120
-100
-80
-60
Frequency )MHz(
Pow
er/
frequency )
dB
/Hz(
Reconstructed sequence fixed point modelsim #2
Matlab simulation
Evaluation - Results
0 10 20 30 40 50 60 70 80 90-180
-160
-140
-120
-100
-80
-60
Frequency )MHz(
Pow
er/
frequency )
dB
/Hz(
Reconstructed sequence #1
0 10 20 30 40 50 60 70 80 90-180
-160
-140
-120
-100
-80
-60
-40
Frequency )MHz(
Pow
er/
frequency )
dB
/Hz(
Reconstructed sequence #2
0 10 20 30 40 50 60 70 80 90-180
-160
-140
-120
-100
-80
-60
Frequency )MHz(
Pow
er/
frequency )
dB
/Hz(
Reconstructed sequence fixed point modelsim #1
0 10 20 30 40 50 60 70 80 90-180
-160
-140
-120
-100
-80
-60
-40
Frequency )MHz(
Pow
er/
frequency )
dB
/Hz(
Reconstructed sequence fixed point modelsim #2
0 10 20 30 40 50 60 70 80 90-180
-160
-140
-120
-100
-80
-60
Frequency )MHz(
Pow
er/f
requ
ency
)dB
/Hz(
Reconstructed sequence #1
0 10 20 30 40 50 60 70 80 90-180
-160
-140
-120
-100
-80
-60
-40
Frequency )MHz(
Pow
er/f
requ
ency
)dB
/Hz(
Reconstructed sequence #2
0 10 20 30 40 50 60 70 80 90-180
-160
-140
-120
-100
-80
-60
Frequency )MHz(
Pow
er/f
requ
ency
)dB
/Hz(
Reconstructed sequence fixed point modelsim #1
0 10 20 30 40 50 60 70 80 90-180
-160
-140
-120
-100
-80
-60
-40
Frequency )MHz(
Pow
er/f
requ
ency
)dB
/Hz(
Reconstructed sequence fixed point modelsim #2FPGA output
Matlab simulation
Evaluation - Results
FPGA output
Matlab simulation
0 20 40 60 80-180
-160
-140
-120
-100
-80
-60
Frequency )MHz(
Pow
er/f
requ
ency
)dB
/Hz(
Reconstructed sequence #1
0 20 40 60 80-180
-160
-140
-120
-100
-80
-60
Frequency )MHz(
Pow
er/f
requ
ency
)dB
/Hz(
Reconstructed sequence #2
0 20 40 60 80-180
-160
-140
-120
-100
-80
-60
Frequency )MHz(
Pow
er/f
requ
ency
)dB
/Hz(
Reconstructed sequence #3
0 20 40 60 80-180
-160
-140
-120
-100
-80
-60
Frequency )MHz(
Pow
er/f
requ
ency
)dB
/Hz(
Reconstructed sequence fixed point hardware #1
0 20 40 60 80-180
-160
-140
-120
-100
-80
-60
Frequency )MHz(
Pow
er/f
requ
ency
)dB
/Hz(
Reconstructed sequence fixed point hardware #2
0 20 40 60 80-180
-160
-140
-120
-100
-80
-60
Frequency )MHz(
Pow
er/f
requ
ency
)dB
/Hz(
Reconstructed sequence fixed point hardware #3
0 20 40 60 80-180
-160
-140
-120
-100
-80
-60
Frequency )MHz(
Pow
er/
freq
uen
cy
)dB
/Hz(
Reconstructed sequence #1
0 20 40 60 80-180
-160
-140
-120
-100
-80
-60
Frequency )MHz(
Pow
er/
freq
uen
cy
)dB
/Hz(
Reconstructed sequence #2
0 20 40 60 80-180
-160
-140
-120
-100
-80
-60
Frequency )MHz(P
ower/
freq
uen
cy
)dB
/Hz(
Reconstructed sequence #3
0 20 40 60 80-180
-160
-140
-120
-100
-80
-60
Frequency )MHz(
Pow
er/
freq
uen
cy
)dB
/Hz(
Reconstructed sequence fixed point hardware #1
0 20 40 60 80-180
-160
-140
-120
-100
-80
-60
Frequency )MHz(
Pow
er/
freq
uen
cy
)dB
/Hz(
Reconstructed sequence fixed point hardware #2
0 20 40 60 80-180
-160
-140
-120
-100
-80
-60
Frequency )MHz(
Pow
er/
freq
uen
cy
)dB
/Hz(
Reconstructed sequence fixed point hardware #3
Evaluation - Results
Support changed
Support Change experiment
Evaluation - Discussion
• Inspection of correctness were done in comparison to Matlab under the following:– Maximal MSE of the calculated pseudo inversed
matrix values– Maximal and averaged values of the difference
between the results of the matlab simulation and the actual results
– By looking and inspecting differences….
• The SCD experiment was composed of two uneven support samples bundles put together to inspect correctness and conclude further about the support threshold.
Evaluation – conclusions
• The MSE inspected for the inversed matrix is 10^-3
• The MSE for the reconstructed signal:– Maximal 0.04– Averaged ~10^-6
• No actual conclusions were made about the support changes in function – the predictable behavior of the function is only in the support changes.
Outline
• Overview – Goals and discussion• Algorithm review• Implementation in hardware• Changes for Adaptation to hardware• Evaluation• Possible Optimization & Future Work
Future Work
• Possible Optimizations– Modification to the inversion algorithm for
higher parallelism.– Scaling hardware to increase performance.
• Possibly changing the resolution of the calculations to 22 or more bits for more accurate resolution - great cost in hardware.
• Integration
Summary
• We have managed to activate the DSP and SCD module on FPGA and got sufficient results.
• We introduced an algorithm for calculating the support threshold.
• We changed most architecture to support pipeline and use minimal hardware – vector resolution.
• Changed debug environment to support a different FPGA.