Readout Processing and Noise Elimination Firmware for the Fermilab Beam Loss Monitor System
description
Transcript of Readout Processing and Noise Elimination Firmware for the Fermilab Beam Loss Monitor System
Readout Processing and Noise Elimination Firmware for the Fermilab Beam Loss
Monitor System
Wu, JinyuanC. Drennan, R. Thurman-Keup, Z. Shi, A. Baumbaugh and J. Lewis
Fermilab, April 2007
The Digitizer Card for the Fermilab Beam Loss Monitor System
• Beam loss input signals from ion chambers are integrated and digitized.
• Sliding sums are accumulated and compared with pre-loaded thresholds.
• Over threshold in several places causes beam abort based on pre-defined setting.
• Beam loss signals are filtered and “de-rippled” for display purposes.
• Sequence is controlled by “Seq128” block.
ADC21s/sample
RAM
FastSliding Sum
A>B
SlowSliding Sum
Very SlowSliding Sum
ImmediateSliding Sum Threshold I
AbortLogic
A>BThreshold F
A>BThreshold S
A>BThreshold V
CICSums
De-rippleProcess
Ion ChamberInput
Seq128
The Problem: 3 60Hz AC
• Rectify noise from power supply using 3-phase 60Hz AC are picked up by the input cable laying in the accelerator tunnel.
0
1000
2000
3000
4000
5000
6000
0 360 720 1080 1440 1800 2160 2520 2880 3240 3600
frequency (Hz)
Ampl
itude
Time Domain
Frequency Domain
ADC21s/sample
Filter Functions
SlidingSum
Cascaded IntegratorComb (CIC) Sum of 2nd Order
)1(
][1][Km
mk
kxms
)12(
][][][Kn
nk
kxkhny
• The CIC sum is a sliding sum of sliding sums.
• The frequency response of CIC sum is a sinc2(x) function that has 2nd order zeros and better stop band suppression.
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
0 5 10 15 20x
sinc(x)sinc 2̂(x)
First Zero @ 360 Hz
Frequency
21s/sample124 samples
Filtering Works, But Partially
• Noises >360Hz, the dominating portion, are filtered out in both filter functions.
• CIC sum is a lot smoother than the sliding sum. • But small signals are still buried under ripples of 60 and 180 Hz.
SlidingSum
CICSum
Signals
Why Not Filtering Further?
• Filtering is an averaging process over many periods. There is not much time after reset.
• The noises before the accelerator ramping and after have different amplitudes and shapes.
• A “De-Ripple” algorithm has been developed.
Ramping
De-ripple Process (1.1)Waveform Extraction, Storage and Validation
WaveformBuffer Page 0
Waveform MeanWaveform
Buffer Page 1 Waveform Mean
• The CIC sum is stored into the waveform buffer and accumulated for the waveform mean.
De-ripple Process (1.2)Waveform Extraction, Storage and Validation
WaveformBuffer Page 0
Waveform MeanWaveform
Buffer Page 1 Waveform Mean
• When it shows a good periodic property, the waveform becomes valid.
De-ripple Process (1.3)Waveform Extraction, Storage and Validation
WaveformBuffer Page 0
Waveform MeanWaveform
Buffer Page 1 Waveform Mean
• If the data is non-periodic, the waveform becomes invalid.
De-ripple Process (2)Waveform Subtraction
WaveformBuffer Page 0
Waveform MeanWaveform
Buffer Page 1 Waveform Mean
- -
The waveform mean is subtracted to preserve DC component in the final result.
TheDe-rippledSum
Results of De-ripple Process
• Those otherwise hard-to-see small signals now become visible.
• DC and very slow signals are also preserved.
Filter Implementation
RecursiveImplementation
Recursive != IIRNon-RecursiveImplementation
Finite Impulse Respond (FIR)
Infinite Impulse Respond (IIR)
Possible
YesYes
NO
ResourceFriendly
x[n]
s[n]
+s[n]
-x[n-K]
x[n]
The non-recursive implementation needs:• 124 memory fetches,• 124 additions and• more ops for longer sum lengths.
The recursive implementation needs:• 1 memory fetch,• 2 add/sub operations• regardless sum length.
SlidingSum
Recursive Implementation of CIC Sum
The non-recursive implementation needs:• 248 memory fetches,• 248 multiplications,• 248 additions and
more ops for longer sum lengths.
+s[n]
-x[n-K]
x[n]
+y[n]
-s[n-K]
+u[n]
-2x[n-K]
x[n]
+y[n]
x[n-2K]
x[n]
y[n]
*h1*h2
*h[K]
The CIC sum constructedas a sliding sum of slidingsums:• 2 memory fetches,• 0 multiplications,• 4 add/sub ops for any
sum length.
The re-formulated CIC sum uses the raw data buffer rather than a separate buffer.
CICSum
Process SequencingSum1 Sum2 Sum3 Sum4
Sum1 Sum2 Sum3 Sum4
Sum1 Sum2 Sum3 Sum4
Sum1 Sum2 Sum3 Sum4
CH0
CH1
CH2
CH3
CH0
CH1
CH2
CH3
CIC1 CIC2
CIC1 CIC2
CIC1 CIC2
CIC1 CIC2
WFSUB
WFE,S,V
WFSUB
WFE,S,V
WFSUB
WFE,S,V
WFSUB
WFE,S,V
Sum1Sum2Sum3Sum4CIC1CIC2 WFSUB
WFE,S,VSum1Sum2Sum3Sum4CIC1CIC2 WF
SUBWF
E,S,VSum1Sum2Sum3Sum4CIC1CIC2 WFSUB
WFE,S,VSum1Sum2Sum3Sum4CIC1CIC2 WF
SUBWF
E,S,V
• Flat design is fast but uses a lot of logic elements.
• Sequencing the process saves logic elements significantly.
• Partially flat and partially sequence design sometimes is a better arrangement in FPGA.
BLM DC Process Sequencing
• The processes of calculating sliding sums and CIC sums are fully sequenced.• The de-ripple processor is flat for the process path. But it operates sequentially for 4
channels.
+ SlidingSum 1
(-)
+u[n]
-2x[n-K]
x[n]
+y[n]
x[n-2K] +u[n-L]
-2x[n-L-K]
+y[n-L]
x[n-L-2K]
x[n-L]
If |y[n]-y[n-L]|>MaxDY for entire period, then PG++.WF
PG=0WF
PG=1 PG
---
WF-WM DR=y[n]-(WF-WM)
MaxDYDecimation
Counter
+ SlidingSum 4
(-)SlidingSum 2
SlidingSum 3
Fully Sequencing
PartiallyFlat
FPGA Process Sequencing Options
ProgramType
ProgramLength(CLK cycles)
Reprogram ResourceUsage
Finite State Machine(FSM)
FixedWired
10 Hard Small
Enclosed Loop Micro-Sequencer(ELMS)
MemoryStoredProgram
10-1000 Easy Small
Microprocessor(MP)
MemoryStoredProgram
>1000 Easy Large
ELMS– Enclosed Loop Micro-Sequencer
Loop & Return Logic + Stack
Conditional Branch Logic
ProgramCounter
ROM128x
36bits
AReset
CLK Con
trol S
igna
lsPC Control Signals Opration00 000000000000000 01 001000100011010 LD R1, #n02 000010001000000 LD R2, #addr_a03 000000000000100 LD R3, #addr_X04 000000010001000 LD R7, #005 000000000100001 BckA1 LD R4, (R2)06 000100000010000 INC R207 000001000100000 LD R5, (R3)08 000100010000001 INC R309 001001000100000 MUL R6, R4, R50a 000000010001000 EndA1 ADD R7, R7, R60b 000010000010000 DEC R10c 000000100000100 BRNZ BckA1
Special in ELMSSupports FOR loops at machine code level
• PC+ROM is a good sequencer in FPGA.
• Adding Conditional Branch Logic allows the program to loop back.
• Loop & Return Logic + Stack is a special feature in ELMS that supports FOR loops at machine code level.
Allows jump back as in microprocessors
ELMS – Detailed Block Diagram
UserControlSignals
ROM128x
36bits
+1
CondJMP
PC
Reset
Loop & Return Registers
+ Stack (128 words)
Compare
RTNJMPIF
CNT
endA
bckA
PushPop
LoopBack
DEC
RTN
LastPass
LoopBack = DEC =(PC==endA) && (CNT!=0)
LastPass =(PC==endA) && (CNT==1)
desA
JMP
0x04
RUNat04 cnt EndA BckA
FOR BckA1 EndA1 #nLD R2, #addr_aLD R3, #addr_XLD R7, #0
BckA1 LD R4, (R2)INC R2LD R5, (R3)INC R3MUL R6, R4, R5
EndA1 ADD R7, R7, R6LD R8, R7
The Stack supports nested loops, up to 128 layers.
Software: Using Spread Sheet as Compiler
What’s Good About ELMSFOR Loops at Machine Code Level
• Looping sequence is known in this example before entering the loop.• Regular micro-processor treat the sequence as unknown.• ELMS supports FOR loops with pre-defined iterations at machine code level.• Execution time is saved and micro-complexities (branch penalty, pipeline bubble,
etc.) associated with conditional branches are avoided.
LD R1, #nLD R2, #addr_aLD R3, #addr_XLD R7, #0
BckA1 LD R4, (R2)INC R2LD R5, (R3)INC R3MUL R6, R4, R5
EndA1 ADD R7, R7, R6DEC R1BRNZ BckA1
FOR BckA1 EndA1 #nLD R2, #addr_aLD R3, #addr_XLD R7, #0
BckA1 LD R4, (R2)INC R2LD R5, (R3)INC R3MUL R6, R4, R5
EndA1 ADD R7, R7, R6
n
iii XaY
0
25%
Microprocessor The ELMS
Conditional Branch
Conclusion
• The de-ripple algorithm is an useful alternative method for eliminating low frequency periodic noises.
• The ELMS is a handy sequence controller in FPGA that uses small amount of resources.
The End
Thanks
What’s Good about ELMSNo ALU => Small Resource Usage
ProgramDATA
Memory
PrincetonArchitecture
HarvardArchitecture
FermilabArchitecture(?)
ProgramControl
ALU
ProgramMemory
ProgramControl
ALUDATAMemory
ProgramMemory
Sequencer(ELMS)
Data Processor
DATAMemory
• The Princeton Architecture is more suitable at system level while Harvard Architecture is better suited at micro-structure level.
• Regular microprocessors cannot run looped program without an ALU.
• The ALU takes large amount of resource while may not be efficiently utilized for data processing tasks in FPGA.
• The ELMS can run nested loop program without an ALU.
• Further separation of Program and data is therefore possible.
• The ELMS is kept small.