Feature Extraction
IceCube Collaboration meeting in Berkeley, March 2005
Dmitry Chirkin, LBNL
What is Feature Extraction
Given an ATWD or FADC waveform, determine arrival times of all photons which contributed:
• hit series• FEInfo: combination or leading edge, width, charge (or amplitude)
Also applicable to AMANDA TWR
Feature Extraction last fall (DFL data)
Fitted function: p0+A0 exp(-(t-t0)/s0)(1-exp(-(t-t0)2/s02))
New “features” discovered since
• undershooting: 1 mV for 50 mV pulse (Christopher Wendt): +~(exp(-(t-t0)/dt)-1)
• possible extra late pulse (PMT anode configuration artifact) (Shigeru Yoshida) fit independetly
• pedestal drift (corrected for by the fat-reader)
Multi-peak fit
• find first peak using the existing algorithm (refined with the root fit)
• construct difference with the fitted function, weight by 1/F(waveform) to emphasize all peaks (big and small)
• find maximum and add SPE fit function with t0 close to it
Multi-peak fit (cont.)
• Fit the sum of two SPE functions to the waveform
• repeat for all SPE terms with amplitude above the threshold until the quality of the fit stops improving
In-Ice Fits (low PEs)
In-Ice Fits (high PEs)
IceTop Fits
Other feature extraction “features”
• other fitting functions were tried: log-normal (by Tom McCauley) provides a different description of the rising leading
edge
• the undershooting is now fitted, so higher ATWD channels should be used not only for saturated values, but also for values close to 0
zero-suppression road grader algorithm needs be modified to suppress the “most-repeated value” instead of 0
• the higher ATWD channels are narrower, creating extra “mismatch” peak at the trailing edge.
higher-channel peaks need to be widened before they are combined with channel 0
Other feature extraction “features”• a “slewing” correction (shift of the leading edge proportional to width) may need to be made to the leading edge to describe electronics delays
Laser DFL or flasher in-ice calibration?
• another correction proportional to high-voltage needs must made to describe high-voltage-dependent delay of the developing signal in the PMT
Laser DFL calibration should be sufficient?
IceTray FE implementationFeatureExtractor is a project on glacier, a part of:
• OFFLINE-SOFTWARE• FATDATA
example script is in the fat-reader/resources/ directory you can control:• MaxNumHits: maximum number of separate SPE functions to be fit, if necessary (default 20)• through the “DataOptions” of the fat-reader select hits that only pass a certain fraction of SPE threshold (--thrs)
At this time hidden in the source code:• maximum SPE waveform width – reduce it to split up large pulses into smaller ones (default 6 bins)• fixed parameters for the description of undershooting
FeatureExtractor usage and dataclasses
• ATWDChannelMerger must be plugged in to produce the CombinedATWD waveform used by the FeatureExtractor
• I3DOMCalibration class was modified to accommodate calibration and combining of the ATWD channels of different size:
now Set methods set by ATWD bin “name”, 0-127 in reversed time order, as before
now Get methods get by the time-ordered ATWD bin number, 0-127 in correct time order this changed
need not worry about this if only combined ATWD traces or Feature-Extracted hits are used
Conclusions
• possibility to fit multi-peak waveforms was a highly-anticipated feature, which should be considered a major improvement
• precision of the multi-peak fits for complicated waveforms is proportional to the time one is willing to spend on extracting features from waveforms: from a few milliseconds for 2-3 peaks to a few seconds for 10 to a few dozen seconds for 20.
• ATWDChannelMerger and I3DOMCalibration class were modified to accommodate for hits with different ATWD-channel sizes (e.g., currently for in-ice: 128, 32, 32)
• FeatureExtractor is a part of both OFFLINE-SOTWARE, and FATDATA. For the FeatureExtractor development the FATDATA provides a more versatile environment, allowing for a fast selection of the high- or low-PE events.
Road-grader zero-suppression
Common SPE-like waveform:• pedestal is shifted down compared to the value expected from calibration. This is a well-known (by now) effect and is corrected by the fat-reader• ok to use road-grader as is
Road-grader zero-suppression
A large-amplitude, saturated waveform:
• undershoot is not recorded by the current road-grader implementation, but is a part of the waveform “features”
Road-grader zero-suppression
Highly-saturated muti-PE waveform:
• the undershooting and small pulses on top of the undershot tail are all suppressed by the road-grader. Both amount of the undershooting and small pulses are features of the waveform and are used/reconstructed by the FeatureExtractor
Road-grader proposed modifications
• find the “most-repeated” value, and compress all values above and below it (no more than a threshold-setting away)
this requires one pass over the incoming waveform and a small (~256 byte) memory buffer
the zero-suppressed value itself should be encoded into the compressed data
• to make word length more uniform (11 bits all the time), prepend the 10-bit number of the next zero-suppressed words with “1”, and all other (10-bit) values with “0”. This is more uniform (and possibly efficient) than the current road-grader + Huffman-encoding algorithm
Modified road-grader compression ratio
Dawn’s MPE set Dawn’s SPE set
Top Related