Post on 24-Mar-2020
UNIVERSITA DEGLI STUDI DI BOLOGNA
FACOLTA DI SCIENZE MATEMATICHE FISICHE E NATURALI
DOTTORATO DI RICERCA IN FISICA XIV ciclo
HARDWARE IMPLEMENTATION OF
DATA COMPRESSION ALGORITHMS
IN THE ALICE EXPERIMENT
Tesi di Dottorato
di:
Dott. Davide Falchieri
Tutori:
Prof. Maurizio Basile
Prof. Enzo Gandolfi
Coordinatore:
Prof. Giovanni Venturi
Anno Accademico 2000/2001
UNIVERSITA DEGLI STUDI DI BOLOGNA
FACOLTA DI SCIENZE MATEMATICHE FISICHE E NATURALI
DOTTORATO DI RICERCA IN FISICA XIV ciclo
HARDWARE IMPLEMENTATION OF
DATA COMPRESSION ALGORITHMS
IN THE ALICE EXPERIMENT
Tesi di Dottorato
di:
Dott. Davide Falchieri
Tutori:
Prof. Maurizio Basile
Prof. Enzo Gandolfi
Coordinatore:
Prof. Giovanni Venturi
Parole chiave: ALICE, data compression, CARLOS, wavelets, VHDL
Anno Accademico 2000/2001
Contents
Introduction ix
1 The ALICE experiment 1
1.1 The Inner Tracking System . . . . . . . . . . . . . . . . . . . 2
1.1.1 Tracking in ALICE . . . . . . . . . . . . . . . . . . . . 3
1.1.2 Physics of the ITS . . . . . . . . . . . . . . . . . . . . 4
1.1.3 Layout of the ITS . . . . . . . . . . . . . . . . . . . . . 6
1.2 Design of the drift layers . . . . . . . . . . . . . . . . . . . . . 8
1.3 The SDDs (Silicon Drift Detectors) . . . . . . . . . . . . . . . 10
1.4 SDD readout system . . . . . . . . . . . . . . . . . . . . . . . 12
1.4.1 Front-end module . . . . . . . . . . . . . . . . . . . . . 14
1.4.2 Event-buffer strategy . . . . . . . . . . . . . . . . . . . 17
1.4.3 End-ladder module . . . . . . . . . . . . . . . . . . . . 18
1.4.4 Choice of the technology . . . . . . . . . . . . . . . . . 19
2 Data compression techniques 21
2.1 Applications of data compression . . . . . . . . . . . . . . . . 22
2.2 Remarks on information theory . . . . . . . . . . . . . . . . . 23
2.3 Compression techniques . . . . . . . . . . . . . . . . . . . . . 24
2.3.1 Lossless compression . . . . . . . . . . . . . . . . . . . 25
2.3.2 Lossy compression . . . . . . . . . . . . . . . . . . . . 25
2.3.3 Measures of performance . . . . . . . . . . . . . . . . . 25
2.3.4 Modelling and coding . . . . . . . . . . . . . . . . . . . 26
2.4 Lossless compression techniques . . . . . . . . . . . . . . . . . 27
2.4.1 Huffman coding . . . . . . . . . . . . . . . . . . . . . . 27
v
CONTENTS
2.4.2 Run Length encoding . . . . . . . . . . . . . . . . . . . 31
2.4.3 Differential encoding . . . . . . . . . . . . . . . . . . . 32
2.4.4 Dictionary techniques . . . . . . . . . . . . . . . . . . . 33
2.4.5 Selective readout . . . . . . . . . . . . . . . . . . . . . 34
2.5 Lossy compression techniques . . . . . . . . . . . . . . . . . . 35
2.5.1 Zero supression . . . . . . . . . . . . . . . . . . . . . . 35
2.5.2 Transform coding . . . . . . . . . . . . . . . . . . . . . 36
2.5.3 Subband coding . . . . . . . . . . . . . . . . . . . . . . 41
2.5.4 Wavelets . . . . . . . . . . . . . . . . . . . . . . . . . . 45
2.6 Implementation of compression algorithms . . . . . . . . . . . 51
3 1D compression algorithm and implementations 55
3.1 Compression algorithms for SDD . . . . . . . . . . . . . . . . 55
3.2 1D compression algorithm . . . . . . . . . . . . . . . . . . . . 56
3.3 1D algorithm performances . . . . . . . . . . . . . . . . . . . . 58
3.3.1 Compression coefficient . . . . . . . . . . . . . . . . . . 59
3.3.2 Reconstruction error . . . . . . . . . . . . . . . . . . . 60
3.4 CARLOS v1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
3.4.1 Board description . . . . . . . . . . . . . . . . . . . . . 62
3.4.2 CARLOS v1 design flow . . . . . . . . . . . . . . . . . 65
3.4.3 Functions performed by CARLOS v1 . . . . . . . . . . 67
3.4.4 Tests performed on CARLOS v1 . . . . . . . . . . . . 68
3.5 CARLOS v2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
3.5.1 The firstcheck block . . . . . . . . . . . . . . . . . . . 71
3.5.2 The barrel shifter block . . . . . . . . . . . . . . . . . . 72
3.5.3 The fifo block . . . . . . . . . . . . . . . . . . . . . . . 73
3.5.4 The event-counter block . . . . . . . . . . . . . . . . . 75
3.5.5 The outmux block . . . . . . . . . . . . . . . . . . . . 76
3.5.6 The feesiu (toplevel) block . . . . . . . . . . . . . . . . 81
3.5.7 CARLOS-SIU interface . . . . . . . . . . . . . . . . . . 82
3.6 CARLOS v2 design flow . . . . . . . . . . . . . . . . . . . . . 87
3.7 Tests performed on CARLOS v2 . . . . . . . . . . . . . . . . . 89
vi
CONTENTS
4 2D compression algorithm and implementation 91
4.1 2D compression algorithm . . . . . . . . . . . . . . . . . . . . 91
4.1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . 91
4.1.2 How the 2D algorithm works . . . . . . . . . . . . . . . 95
4.1.3 Compression coefficient . . . . . . . . . . . . . . . . . . 96
4.1.4 Reconstruction error . . . . . . . . . . . . . . . . . . . 97
4.2 CARLOS v3 vs. the previous prototypes . . . . . . . . . . . . 98
4.3 The final readout architecture . . . . . . . . . . . . . . . . . . 101
4.4 CARLOS v3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
4.5 CARLOS v3 building blocks . . . . . . . . . . . . . . . . . . . 103
4.5.1 The channel block . . . . . . . . . . . . . . . . . . . . 105
4.5.2 The encoder block . . . . . . . . . . . . . . . . . . . . 105
4.5.3 The barrel15 block . . . . . . . . . . . . . . . . . . . . 107
4.5.4 The fifonew32x15 block . . . . . . . . . . . . . . . . . 108
4.5.5 The channel-trigger block . . . . . . . . . . . . . . . . 111
4.5.6 The ttc-rx-interface block . . . . . . . . . . . . . . . . 112
4.5.7 The fifo-trigger block . . . . . . . . . . . . . . . . . . . 112
4.5.8 The event-counter block . . . . . . . . . . . . . . . . . 113
4.5.9 The outmux block . . . . . . . . . . . . . . . . . . . . 113
4.5.10 The trigger-interface block . . . . . . . . . . . . . . . . 116
4.5.11 The cmcu block . . . . . . . . . . . . . . . . . . . . . . 117
4.5.12 The pattern-generator block . . . . . . . . . . . . . . . 119
4.5.13 The signature-maker block . . . . . . . . . . . . . . . . 121
4.6 Digital design flow for CARLOS v3 . . . . . . . . . . . . . . . 122
4.7 CARLOS layout features . . . . . . . . . . . . . . . . . . . . . 123
5 Wavelet based compression algorithm 125
5.1 Wavelet based compression algorithm . . . . . . . . . . . . . . 126
5.1.1 Configuration parameters of the multiresolution algo-
rithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
5.2 Multiresolution algorithm optimization . . . . . . . . . . . . . 129
5.2.1 The Wavelet Toolbox from Matlab . . . . . . . . . . . 130
5.2.2 Choice of the filters . . . . . . . . . . . . . . . . . . . . 131
vii
CONTENTS
5.2.3 Choice of the dimensionality, number of levels and thresh-
old value . . . . . . . . . . . . . . . . . . . . . . . . . . 139
5.3 Choice of the architecture . . . . . . . . . . . . . . . . . . . . 141
5.3.1 Simulink and the Fixed-Point Blockset . . . . . . . . . 141
5.3.2 Choice of the architecture . . . . . . . . . . . . . . . . 143
5.4 Multiresolution algorithm performances . . . . . . . . . . . . . 149
5.5 Hardware implementation . . . . . . . . . . . . . . . . . . . . 151
Conclusions 159
Bibliography 161
viii
Introduction
This thesis work has been aimed at the hardware implementation of data
compression algorithms to be applied to High Energy Physics Experiments.
The amount of data that will be produced by LHC experiments at CERN
is of the order of magnitude of 1 GByte/s. Cost constraints on magnetic
tapes and data acquisition systems (optical fibres, readout boards) require
to apply on-line data compression on the front-end electronics of the different
detectors. This leads to the search of the compression algorithms allowing to
achieve a high compression ratio, while keeping low the value of the recon-
struction error. In fact a high compression coefficient can only be achieved
at the expense of some loss on the physical data.
The thesis contains the description of the hardware implementation of com-
pression algorithms applied to the ALICE experiment for what concerns the
SDD (Silicon Drift Detector) readout chain. The total amount of data pro-
duced by SDDs is 32.5 MBytes per event, while the reserved space on mag-
netic tapes for permanent storage is 1.5 MBytes. This means that the com-
pression coefficient has to be at least 22. Beside that, since the p-p interaction
rate is 1000 Hz, data compression hardware has to complete its job within 1
ms. This leads to the search for high performances compression algorithms
for what concerns both compression ratio and execution speed.
The thesis contains a description of the design and implementation of 3
prototypes of the ASIC CARLOS (Compression And Run Length encOd-
ing Subsystem) which deals with the on-line data compression, packing and
transmission to the standard ALICE data acquisition system. CARLOS v1
and v2 contain a uni-dimensional compression algorithm based on threshold,
run length encoding, differential encoding and Huffman coding techniques.
ix
Introduction
CARLOS v3 was meant to contain a bi-dimensional compression algorithm
that obtains a better compression ratio than 1D with a lower physical data
loss. Nevertheless, for time reasons, the design of CARLOS v3 sent to the
foundy contains a simple 1D look-up table based compression algorithm. The
2D algorithm is about to be implemented in the next prototype, which should
be the final version of CARLOS. The first two prototypes have been tested
with good results; the third one is in realization phase up to now and its test
will begin from February 2002.
Beside that, the thesis contains a detailed study of a wavelet-based compres-
sion algorithm, which obtains encouraging results for what concerns both
compression ratio and reconstruction error. The algorithm may find a suit-
able application as a second level compressor on SDD data in the case that
it might become necessary to switch off the compression algorithm imple-
mented on CARLOS.
The thesis is structured in the following way:
• Chapter 1 contains a description of the ALICE experiment, especially
for what concerns the SDD readout architecture.
• Chapter 2 contains an introduction to standard compression algorithms.
• Chapter 3 contains a description of the 1D algorithm developed at the
INFN Section of Torino and the two prototypes CARLOS v1 and v2.
• Chapter 4 focuses on the 2D compression algorithm and on the design
and implementation of the prototype CARLOS v3.
• Chapter 5 contains a description of a wavelet-based compression algo-
rithm especially tuned to reach high performances on SDD data and
its possible application to a second level compressor in counting room.
x
Chapter 1
The ALICE experiment
ALICE (A Large Ion Collider Experiment) [1] is an experiment at the Large
Hadron Collider (LHC) [2] optimized for the study of heavy-ion collisions,
at a centre-of-mass energy of 5.5 TeV per nucleon. The main aim of the
experiment is to study in details the behaviour of nuclear matter at high
densities and temperatures, in view of probing deconfinment and chiral sym-
metry restoration.
The detector [1, 3] consists essentially of two main components: the central
part, composed of detectors mainly devoted to the study of hadronic signals
and dielectrons, and the forward muon spectrometer, devoted to the study
of quarkonia behaviour in dense matter. The layout of the ALICE set-up is
shown in Fig. 1.1.
A major technical challenge is imposed by the large number of particles cre-
ated in the collisions of lead ions. There is a considerable spread in the
currently available predictions for the multiplicity of charged particles pro-
duced in a central Pb-Pb collision. The design of the experiment has been
based on the highest value, 8000 charged particles per unit of rapidity, at
midrapidity. This multiplicity dictates the granularity of the detectors and
their optimal distance from the colliding beams. The central part, which
covers ±45 (η ≤ 0.9) over the full azimuth, is embedded in a large magnet
with a weak solenoidal field. Outside of the Inner Tracking System (ITS),
there are a cylindrical TPC (Time Projection Chamber) and a large area PID
array of time-of-flight (TOF) counters. In addition, there are two small-area
1
The ALICE experiment
Figure 1.1: Longitudinal section of the ALICE detector
single-arm detectors: an electromagnetic calorimeter (Photon Spectrometer,
PHOS) and an array of RICH counters optimized for high-momentum inclu-
sive particle identification (HMPID).
My thesis work has been focused on data coming from one of the three de-
tectors forming the ITS, the Silicon Drift Detector (SDD).
1.1 The Inner Tracking System
The basic functions of the ITS [4] are:
• determination of the primary vertex and of the secondary vertices nec-
essary for the reconstruction of charm and hyperon decays;
• particle identification and tracking of low-momentum particles;
• improvement of the momentum and angle measurements of the TPC.
2
1.1 — The Inner Tracking System
1.1.1 Tracking in ALICE
Track finding in heavy-ion collisions at the LHC presents a big chal-
lenge, because of the extremely high track density. In order to achieve
a high granularity and a good two-track separation, ALICE uses three-
dimensional hit information, wherever feasible, with many points on
each track and a weak magnetic field. The ionization density of each
track is measured for particle identification. The need for a large num-
ber of points on each track has led to the choice of a TPC as the main
tracking system. In spite of its drawbacks, concerning speed and data
volume, only this device can provide reliable performance for a large
volume at up to 8000 charged particles per unit of rapidity. The min-
imum possible inner radius of the TPC (rin = 90 cm) is given by the
maximum acceptable hit density. The outer radius (rout = 250 cm)
is determined by the minimum length required for a dE/dx resolution
better than 10 %. At smaller radii, and hence larger track densities,
tracking is taken over by the ITS.
The ITS consists of six cylindrical layers of silicon detectors. The num-
ber and position of the layers are optimized for efficient track finding
and impact parameter resolution. In particular, the outer radius is
determined by the track matching with the TPC, and the inner one
is the minimum compatible with the radius of the beam pipe (3 cm).
The silicon detectors feature the high granularity and excellent spatial
precision required.
Because of the high particle density, up to 90 cm−2, the four inner-
most layers (r ≤ 24 cm) must be truly two-dimensional devices. For
this task, silicon pixel and silicon drift detectors were chosen. The
outer two layers at r = 45 cm, where the track densities are below
1 cm−2, are equipped with double-sided silicon micro-strip detectors.
With the exception of the two innermost pixel planes, all layers have
analog readout for particle identification via a dE/dx measurement
in the non-relativistic region. This gives the inner tracking system a
stand-alone capability as a low-pt particle spectrometer.
3
The ALICE experiment
1.1.2 Physics of the ITS
The ITS will contribute to the track reconstruction by improving the
momentum resolution obtained by the TPC. This will be beneficial for
practically all physics topics which will be addressed by the ALICE ex-
periment. The global event features will be studied by measuring the
multiplicity distributions and the inclusive particle spectra. For the
study of resonance production (ρ, ω and φ), and, more important, the
behaviour of the mass and width of these mesons in the dense medium,
the momentum resolution is even more important. We have to achieve
a mass precision comparable to, or better than, the natural width of
the resonances in order to observe changes of their parameters caused
by chiral symmetry restoration. Also the mass resolution for heavy
states, like D mesons, J/ψ and Υ, will be better, thus improving the
signal-to-background ratio in the measurement of the open charm pro-
duction, and in the study of heavy-quarkonia suppression. Improved
momentum resolution will enhance the performances in the observa-
tion of another hard phenomenon, the jet production and predicted jet
quenching, i.e. the energy loss of partons in strongly interacting dense
matter.
The low-momentum particles (below 100 MeV/c) will be detectable
only by the ITS. This is of interest in itself, because it widens the mo-
mentum range for the measurement of particle spectra, which allows
collective effects associated with the large length scales to be studied.
In addition, a low-pt cut-off is essential to suppress the soft gamma
conversions and the background in the electron-pair spectrum due to
Dalitz pairs. Also the PID capabilities of the ITS in the non-relativistic
(1/β2) region will therefore be of great help.
In addition to the improved momentum resolution, which is necessary
for the identical particle interferometry, especially at low momenta, the
ITS will contribute to this study through an excellent double-hit reso-
lution enabling the separation of tracks with close momenta. In order
to be able to study particle correlations in the three components of
4
1.1 — The Inner Tracking System
their relative momenta, and hence to get information about the space
time evolution of the system produced in heavy-ion collisions at the
LHC, we need sufficient angular resolution in the measurement of the
particle’s direction. Two of the three components of the relative mo-
mentum (the side and longitudinal ones) are crucially dependent on
the precision with which the particle direction is known. The angular
resolution is determined by the precise ITS measurements of the pri-
mary vertex position and of the first points on the tracks. The particle
identification at low momenta will enhance the physics capability by
allowing the interferometry of individual particle species as well as the
study of non-identical particle correlations, the latter giving access to
the emission time of different particles.
The study of strangeness production is an essential part of the ALICE
physics program. It will allow the level of chemical equilibration and
the density of strange quarks in the system to be established. The mea-
surement will be performed by charge kaon identification and hyperon
detection, based on the ITS capability to recognize secondary vertices.
The observation of multi-strange hyperons (Ξ− and Ω−) is of particular
interest, because they are unlikely to be produced during the hadronic
rescattering due to the high-energy threshold for their production. In
this way we can obtain information about the strangeness density of
the earlier stage of the collision.
Open charm production in heavy-ion collisions is of great physics in-
terest. Charmed quarks can be produced in the initial hard parton
scattering and then only at the very early stages of the collision, while
the energy in parton rescattering is above the charm production thresh-
old. The charm yield is not altered later. The excellent performance of
the ITS in finding the secondary vertices close to the interaction point
gives us the possibility to detect D mesons, by reconstructing the full
decay topology.
5
The ALICE experiment
Figure 1.2: ITS layers
1.1.3 Layout of the ITS
A general view of the ITS is shown in Fig. 1.2. The system consists
of six cylindrical layers of coordinate-sensitive detectors, covering the
central rapidity region (η ≤ 0.9) for vertices located within the length
of the interaction diamond (2σ), i.e. 10.6 cm along the beam direction
(z). The detectors and front-end electronics are held by lightweight
carbon-fibre structures. The geometrical dimensions and the main fea-
tures of the various layers of the ITS are summarized in Table 1.1.
The granularity required for the innermost planes is achieved with
silicon micro-pattern detectors with true two-dimensional readout: Sil-
icon Pixel Detectors (SPD) and Silicon Drift Detectors (SDD). At larger
radii, the requirements in terms of granularity are less stringent, there-
fore double-sided Silicon Strip Detectors (SSD) with a small stereo
angle are used. Double-sided microstrips have been selected rather
than single-sided ones because they introduce less material in the ac-
tive volume. In addition they offer the possibility to correlate the pulse
height read out from the two sides, thus helping to resolve ambiguities
inherent in the use of detectors with projective readout. The main
parameters for each of the three detector types are: spatial precision,
two-track resolution, pixel size, number of channels of an individual
detector, total number of electronic channels are shown in Table 1.1.
6
1.1 — The Inner Tracking System
Parameter Pixel Drift Strip
Spatial precision rφ µm 12 38 20
Spatial precision z µm 70 28 830
Two-track resolution rφ µm 100 200 300
Two-track resolution z µm 600 600 2400
Cell size µm2 50 x 300 150 x 300 95 x 40000
Active area mm2 13.8× 82 72.5× 75.3 73× 40
Readout channels per module 65536 2 x 256 2 x 768
Total number of modules 240 260 1770
Total number of readout channels k 15729 133 2719
Total number of cells M 15.7 34 2.7
Average occupancy (inner layer) 1.5 2.5 4
Average occupancy (outer layer) 0.4 1.0 3.3
Table 1.1: Main features of ITS detectors
The large number of channels in the layers of the ITS requires a large
number of connections from the front-end electronics to the detector
and to the data acquisition system. The requirement for a minimum of
material within the acceptance does not allow the use of conventional
copper cables near the active surfaces of the detection system. There-
fore Tape Automatic Bonded (TAB) aluminium multilayer microcables
are used.
The detectors and their front-end electronics produce a large amount
of heat which has to be removed while keeping a very high degree of
temperature stability. In particular, the SDDs are sensitive to temper-
ature variations in the 0.1 C range. For these reasons, particular care
was taken in the design of the cooling system and of the temperature
monitoring. A water cooling system at room temperature is the chosen
solution for all ITS layers, but the use of other liquid coolants is still
being considered. For the temperature monitoring dedicated integrated
circuits are mounted on the readout boards and specific calibration de-
vices are integrated in the SDDs.
The outer four layers of the ITS detectors are assembled onto a me-
7
The ALICE experiment
Figure 1.3: SDD prototype: 1) active area, 2) guard area.
chanical structure made of two end-cap cones connected by a cylinder
placed between the SSD and the SDD layers. Both the cones and the
cylinder are made of lightweight sandwiches of carbon-fibre plies and
Rohacell TM . The carbon-fibre structure includes also the appropri-
ate mechanical links to the TPC and to the SPD layers. The latter
are assembled in two half-cylinder structures, specifically designed for
safe installation around the beam pipe. The end-cap cones provide the
cabling and cooling connection of the six ITS layers with the outside
services.
1.2 Design of the drift layers
SDDs (a picture is shown in Fig. 1.3) have been selected to equip the
two intermediate layers of the ITS, since they couple a very good multi-
track capability with dE/dx information. At least three measured
samples per track, and therefore at least four layers carrying dE/dx
information are needed. The SDDs, 7.25× 7.53 cm2 active area each,
8
1.2 — Design of the drift layers
Figure 1.4: Longitudinal section of ITS layer 3 and layer 4
will be mounted on linear structures called ladders, each holding six
detectors for layer 3 and eight detectors for layer 4 (see Fig. 1.4).
The layers will sit at the average radius of 14.9 and 23.8 cm from
the beam pipe and will be composed of 14 and 22 ladders respectively.
The front-end electronics will be mounted on rigid heat-exchanging hy-
brids, which in turn will be connected onto cooling pipes running along
the ladder structure. The connections between the detectors and the
front-end electronics, and between both and the ends of the ladders will
be assured with flexible Al microcables, TAB bonded, which will carry
both data and power supply lines. Each detector will be first assembled
together with its front-end electronics and high-voltage connections as
9
The ALICE experiment
++
+−−−
p p p p p
p p p p pp
n+ + + + +
+ + + + + +
+
x
y
z
Figure 1.5: Working mode of a SDD detector
a unit, hereafter called a module, which will be fully tested before it is
mounted on the ladder.
1.3 The SDDs (Silicon Drift Detectors)
SDDs, like gaseous drift detectors, exploit the measurement of the
transport time of the charge deposited by a transversing particle to
localize the impact point in two dimensions, thus enhancing resolution
and multi-track capability at the expense of speed. They are therefore
well suited to this experiment in which very high particle multiplicities
are coupled with relatively low event rates (up to some KHz). A linear
SDD, shown schematically in Fig. 1.5, has a series of parallel implanted
p+ field strips, connected to a voltage divider on both surfaces of the
high-resistivity n-type silicon wafer. The voltage divider is integrated
on the detector substrate itself. The field strips provide the bias voltage
to fully deplete the volume of the detector and they generate an elec-
trostatic field parallel to the wafer surface, thus creating a drift region
(see Fig. 1.6). Electron-hole pairs are created by the charged parti-
cles crossing the detector. The holes are collected by the nearest p+
electrode, while the electrons are focused into the middle plane of the
detector and driven by the drift field towards the edge of the detector
10
1.3 — The SDDs (Silicon Drift Detectors)
Figure 1.6: Potential energy of electrons (negative electric potential) on
the y-z plane of the device
where they are collected by an array of anodes composed of n+ pads.
So far an electronic charge cloud drifts from the impact point to the an-
ode region: the cloud shows a bell-shaped Gaussian distribution that,
owing to the diffusion and mutual repulsion, during the drift becomes
smaller and larger [5] (see Fig. 1.7). In this way a charge cloud can
be collected by one or more anodes depending on the charge released
by the ionizing particle and on the impact position with respect to the
anode region. The small size of the anodes, and hence their small ca-
pacitance (50 fF), imply low noise and good energy resolution.
The coordinate perpendicular to the drift direction is given by the cen-
troid of the collected charge. The coordinate along the drift direction is
measured by the centroid of the signal in the time domain, taking into
account the amplifier response. A space precision, averaged over the
full detector surface, better than 40 µm in both coordinates has been
obtained during beam tests of full-size prototype detectors. Each SDD
module is divided in two half-detectors : each half-detector contains on
the external side 256 anodes at a distance of 300 µm from each another.
So far each SDD detector contains 2 x 256 readout channels: taking
into account that the layer 3 and 4 contain 260 SDD modules, the total
number of SDD readout channels is around 133k.
11
The ALICE experiment
Drift
Anode axis
Tim
e ax
is
Figure 1.7: Charge distribution evolution scheme
1.4 SDD readout system
The system requirements for the SDD readout system derive from both
the features of the detector and the ALICE experiment in general. The
following points are crucial in the definition of the final readout system:
– The signal generated by the SDD is a Gaussian shaped current
signal, with variable sigma and charge (5-30 ns and 4 to 32 fC)
and can be collected by one or more anodes. Therefore the front-
end electronics should be able to handle analog signals in a wide
dynamic range. Then, the system noise should be very low while
being able to handle large signals.
– The amount of data generated by the SDD is very large: each half
detector has 256 anodes and for each anode 256 time samples have
to be taken in order to cover the full drift length.
– The small space available on the ladder and the constraints on
material impose an architecture which minimizes cabling.
– The radiation environment in which the front-end electronics has
to work imposes the choice of a radiation tolerant technological
12
1.4 — SDD readout system
End ladder module
Front−end module
SDD detectors
PASCALAMBRA
CARLOSSIU
Test and slow control
.
.
.
Figure 1.8: SDD ladder electronics
library for the implementation of the electronics.
The chosen SDD readout electronics, shown in Fig. 1.8, consists of
front-end modules and end-ladder modules. The front-end module per-
forms analog data acquisition, A/D conversion and buffering, while the
end-ladder module contains high voltage and low voltage regulators and
a chip for data compression and interfacing the ALICE DAQ system.
13
The ALICE experiment
Figure 1.9: The front-end readout unit
1.4.1 Front-end module
The front-end modules, one per half-detector, are distributed along the
ladders together with the SDD modules. Each front-end module con-
tains 4 PASCAL (Preamplifier, Analog Storage and Conversion from
Analog to digitaL) - AMBRA (A Multievent Buffer Readout Archi-
tecture) chips pairs, as shown in Fig. 1.9. The PASCAL chips are
TAB-bonded directly on the SDD output anodes, while the AMBRA
chips are connected to CARLOS (Compression And Run Length en-
cOding Subsystem) via an 8-bit bus.
Each PASCAL chip contains three functional blocks (see Fig. 1.10):
– low noise preamplifiers (they are 64, one for each anode);
– an analog memory working at a 40 MHz clock frequency (64×256
cells);
– 10-bit analog to digital converters ADC, (they are 64, one for each
channel).
During the write phase, i.e. when no trigger signal has been received,
the preamplifiers continuosly write the samples into the analog memory
14
1.4 — SDD readout system
ADC
ADC
ADC
ADC
ADC...
...
...
...
...
A/D conversion, buffering and multiplexing
Interface control unit
start_op
end_op
write_req
write_ack
jtag_bus
data_out
Analog memorycontrol unit
Analog memory
clock
reset
Preamplifiers
data_in[2]
data_in[1]
data_in[0]
data_in[62]
data_in[63]
pa_cal
Figure 1.10: PASCAL chip architecture
cells at 40 MHz, while the ADCs are in stand-by mode. When PAS-
CAL receives a trigger signal from CARLOS (that receives it from the
Central Trigger Processor, CTP) , a control logic module on the PAS-
CAL chip stops the analog memory write phase, freezes its contents
and starts the read phase, performed in two steps: in the first step the
ADCs are set to sample mode and the analog memory reads out the
first sample for each anode row; after the memory settling time, the
ADCs switch to the conversion mode and analog data are converted
to digital through a successive approximation technique. When the
conversion is finished, the control logic module on PASCAL starts the
15
The ALICE experiment
Input range Output codes Code mapping Bits lost
0-127 from 128 to 128 0xxxxxxx 0
128-255 from 128 to 32 100xxxxx 2
256-511 from 256 to 32 101xxxxx 3
512-1023 from 512 to 64 11xxxxxx 3
Table 1.2: Digital compression from 10 to 8 bits
readout of the next sample from the analog memory and, at the same
time, sends the 64 digital words to the AMBRA chip using a 40-bit
wide bus. The read phase goes on until all the analog memory con-
tent has been converted to digital values or an abort signal comes from
CARLOS (again receiving it from the CTP), meaning that the event
has to be discarded.
The AMBRA chip has mainly two functions: first, AMBRA has to
compress data from 10 to 8 bits per sample, then it has to store the
input data stream into a digital buffer. The principle used for compres-
sion is to decrease the resolution for larger signals with a logarithmic
or square-root law using the mapping shown in Table 1.2. Since the
larger signals have better signal to noise ratio than the smaller ones,
the accuracy of the measurement is not affected.
The 4 AMBRA chips are static RAM able to contain 256 KBytes,
thus being able to temporarily store 4 half-SDD complete events (one
event corresponds to 256 × 256 Bytes = 64 KBytes). Data read/write
stages are allowed at the same time: so far while the PASCAL chips
are transferring data to the AMBRA ones, the AMBRA chips can send
data belonging to an other event to the CARLOS chip. Actually, since
four AMBRA chips have to transmit data over a single 8-bit bus, an
arbitration mechanism has been implemented.
16
1.4 — SDD readout system
1.4.2 Event-buffer strategy
The dead time due to the SDD readout system is around 358.4 µs: this
is, in fact, the time needed for reading a cell of the analog memory and
for converting it into a digital word, 1.4 µs, multiplied by the number
of cells, 256. This means that a new trigger signal will not be accepted
before 358.4 µs have passed after the previous event. Every 1.4 µs each
detector produces 512 bytes of data, then at least 10 8-bit buses per
detector working at 40 MHz are required for data transfer. Unfortu-
nately the space on the ladder is very limited and managing 80 data
lines for each detector (for a total of 320 for the half-ladder) is a very
serious problem, especially for the input connections to the end-ladder
readout units.
The adopted solution to insert a digital multi-event buffer on the front-
end readout unit between PASCAL and CARLOS allows to send data
towards the end-ladder unit at a lower speed, in fact if an other event
arrives while transmitting data from AMBRA to CARLOS, an other
digital buffer on AMBRA is ready to accept data coming from PAS-
CAL. Data is transferred from AMBRA to CARLOS using an 8-bit
bus in 1.65 ms (25 ns x 64 Kwords) while other events are processed
by PASCAL and sent to AMBRA. For an average Pb-Pb event rate of
40 Hz and using a double-event digital buffer, our simulations indicate
that the dead time due to buffer overrun is only 0.1 % of the total time.
This is the amount of time during which AMBRA is transferring data
to CARLOS and the other buffer in AMBRA is full: in this situation
a BUSY signal is asserted towards the CTP, meaning that no further
trigger can be accepted. In order to reach a much smaller amount of
dead time even with higher event rates, a decision was taken to have a
4-buffer-deep AMBRA device.
In order to allow the full testability of the readout electronics at the
board and system levels, the ASICs embody a JTAG standard interface.
In this way it is possible to test each chip after the various assembly
stages and during the run phase in order to check correct functionality.
17
The ALICE experiment
Layer Ladders Detectors/ladder Data/ladder Total data
3 14 6 768 KBytes 10.5 MBytes
4 22 8 1 MByte 22 MBytes
Both 32.5 MBytes
Table 1.3: Total amount of data produced by SDDs
The same interface is used to download control information into the
chips.
Radiation tolerant deep-submicron processes (0.25 µm) has been used
for the final versions of the ASICs. These technologies are now available
and allow us to reduce size and power consumption with no degradation
of the signal processing speed. Moreover, it has been shown that they
have a better resistance to radiation when specific layout techniques
are used, if compared to commercially available technologies.
1.4.3 End-ladder module
The end-ladder modules are located at both ends of each ladder (2
per ladder); they receive data from the front-end modules, perform
data compression with the CARLOS chip and send data to the DAQ
through an optical fibre link.
Beside that, the end-ladder board will host the TTCrx device, a
chip receiving the global clock and trigger signals from the CTP and
distributing it to PASCAL, AMBRA and CARLOS, and the power reg-
ulators for the complete ladder system.
CARLOS receives 8 data streams coming from 8 half-detectors, i.e.
from one half-ladder, for a total volume of data of 64 KBytes × 8 =
512 KBytes, at a rate of 320 MByte/s in input. Taking into account the
number of ladders and detectors per ladder (see Table 1.3), the total
volume of data produced by all the SDD modules amounts to around
22 MBytes per event, while the space reserved on disk for permanent
storage is 1.5 MBytes. This implies to use a compression algorithm
18
1.4 — SDD readout system
with a compression coefficient of at least 22 and a reconstruction er-
ror as low as possible, in order to minimize physical information loss.
Moreover since the trigger rate in proton-proton interactions amounts
to 1 KHz, each event should be compressed and sent to the DAQ sys-
tem within 1 ms. Actually, thanks to the buffering provided by the
AMBRA chips, this processing time doubles to 2 ms, thus relaxing the
timing constraint on the CARLOS chip.
These constraints led us to the design and implementation of a first
prototype of CARLOS. Then the desire to have better compression
performances and changes in the readout architecture due to the pres-
ence of radiations led us to the design and implementation of other two
CARLOS prototypes. We are now going to design CARLOS v4 that
is intended to be the final version of the compression ASIC. The first
3 prototypes of the device CARLOS are explained in details in chap-
ters 3 and 4, while chapter 2 contains a review of existent compression
techniques.
1.4.4 Choice of the technology
The effects of radiations on electronics circuits can be divided in total
dose effects and single event effects (SEU) [6]. Total dose modifies the
thresholds of MOS transistors and increases leakage currents. This is of
particular concern in leakage sensitive analog circuits, like analog mem-
ories. For instance, assuming for the storage capacitors in the memory
a value of 1 pF, a leakage current as small as 1 nA would change the
value of the stored information by 0.2 V in 200 µs. This is of course
unacceptable.
Radiation tolerant layout practices prevent this risk and their use in
analog circuits is therefore recommended. These designs techniques be-
come extremely effective in deep-submicron CMOS technologies. Single
event effects can trigger latch-up phenomena or can change the value
of digital bits (Single Event Upset). Latch-up can be prevented with
the systematic use of guard rings in the layout. Single event upset can
19
The ALICE experiment
be a problem especially when occurring in the digital control logic and
can be prevented by layout techniques or by redundancy in the sys-
tem. Radiation tolerant layouts have of course area penalties. It can
be estimated that in a given technology a minimum size inverter with
radiation tolerant layout is 70% bigger than the corresponding inverter
with standard layout. Nevertheless, a radiation tolerant inverter in a
quarter micron technology is about eight times smaller than a standard
inverter in a 0.8 µm technology. The radiation dose which will be re-
ceived by the readout electronics will be quite low, below 100 Krad in
10 years. This value is probably below the limit of what a standard
technology can afford; however conservative considerations suggested
the use of radiation tolerant techniques for critical parts of the circuit.
These techniques have been proven to work up to 30 MRad and allow
a lower area penalty and lower cost compared with the radiation hard
processes. So far the library chosen for the implementation of PAS-
CAL, AMBRA and CARLOS chips is the 0.25 µm IBM technology
with standard cells designed at CERN to be radiation tolerant.
20
Chapter 2
Data compression techniques
Data compression [7] is the art of science of representing information in
a compact form. These compact representations are created by identify-
ing and using structures that exist in the data. Data can be characters
in a text file, numbers that are samples of speech or image waveforms
or sequences of numbers that are generated by physical processes.
Data compression plays an important role in many fields, for example
in digital television signals transmission. If we wanted to transmit an
HDTV (High Definition TeleVision) signal without any compression, we
would need to transmit about 884 Mbits/s. Using data compression,
we need to transmit less than 20 Mbits/s along with audio information.
Compression is now very much a part of everyday life. If you use com-
puters you are probably using a variety of products that make use of
compression. Most modems now have compression capabilities that al-
low to transmit data many times faster than otherwise possible. File
compression utilities, that permit us to store more on our disks, are
now commonplace.
This chapter contains an introduction to data compression with a de-
scription of the most commonly used compression algorithms, with the
aim of finding out the most suitable compression technique for physical
data coming out from the SDD.
21
Data compression techniques
2.1 Applications of data compression
An early example of data compression is the Morse code, developed
by Samuel Morse in the mid-19th century. Letters sent by telegraph
are encoded with dots and dashes. Morse noticed that certain letters
occurred more often than others. In order to reduce the average time
required to send a message, he assigned shorter sequences to letters that
occur more frequently such as a (· −) and e (·) and longer sequences to
letters that occur less frequently such as q (− − · −) or j (· − − −).
What is being used to provide compression in the Morse code is the
statistical structure of the message to compress, i.e. the message con-
tains letters with a probability to occurr higher than others. So far
most compression techniques exploit the input statistical structure to
provide compression, but this is not the only kind of structure that
exists in the data.
There are many other kinds of structures in data of differents types that
can be exploited for compression. Let us take speech as an example.
When we speak, the physical construction of our voice box dictates the
kinds of sounds that we can produce, that is the mechanics of speech
production impose a structure on speech. Therefore, instead of trans-
mitting the sampled speech itself we could send information about the
conformation of the voice box, which could be used by the receiver to
synthesize the speech. An adequate amount of information about the
conformation of the voice box can be represented much more compactly
than the sampled values of the speech. This compression approach is
being used currently in a number of applications, including transmis-
sion of speech over mobile radios and the synthetic voice in toys that
speak.
Data compression can also take advantage of some redundant structure
of the input signal, that is a structure containing more information than
needed. For example if a sound has to be transmitted for being heard
by a human being, all frequencies below 20 Hz and above 20 KHz
can be eliminated (thus providing compression) since these frequencies
22
2.2 — Remarks on information theory
cannnot be perceived by humans.
2.2 Remarks on information theory
Without going into details we just want to recall Shannon’s theorem [8].
He defines the information contents of a message in the following way:
given a message which is made up of N characters in total containing
n different symbols, the information contents measured in bits of the
message is the following:
I = Nn∑
i=1
(−pilog(pi)) (2.1)
where pi is the occurrence probability of symbol i.
What is regarded as a symbol depends on the application: it might be
an ASCII code, 16 or 32 bit words, words in a text and so on.
A practical illustration of the Shannon theorem is the following: let
us assume to measure a charge or any other physical quantity using
an 8-bit digitizer. Very often measured quantities will be distributed
approximately exponentially. Let us assume that the mean value of
the statistical distribution is one tenth of the dynamic range, i.e. 25.6.
Each value between 0 and 255 is regarded as a symbol. Applying the
Shannon’s formula with n = 256 and pi = e−(i+0.5)
25.6
25.6we obtain a mean
information content I/N of 6.11 bits per measured value which is al-
most 25% less than the 8 bits we need saving the data as a sequence
of bytes. Even if we had increased the dynamic range by a factor of 4
using a 10-bit ADC, it turns out that the mean information contents
expressed as the number of bits per measurement would have been vir-
tually the same and hence the possible compression gain even higher
(39%). This might be surprising but considering that an exponential
distribution delivers a value beyond ten times the mean only every e10
= 22026 samples, it is clear that even using a quite long code for such
measurements cannot have an appreciable influence on the compression
23
Data compression techniques
rates. Considering that with all likelihood in a realistic architecture we
would have had to expand the 10 bits to 16, the gain is impressive 62%
in the latter case.
The exponential distribution is a good approximation of the raw data in
many cases and in particular for data coming out from the SDD. Com-
paring various probability distributions with the same RMS it seems
that the exponential distribution is particularly hard to compress. For
instance a discrete spectrum being distributed according to a Gaussian
with the same RMS as the above exponential only has an information
contents of 4.75 bits.
2.3 Compression techniques
When we speak of a compression technique or a compression algorithm
we actually refer to two algorithms: the first one takes an input X
and generates a representation XC that requires fewer bits; the sec-
ond one is a reconstruction algorithm that operates on the compressed
representation XC to generate the reconstruction Y . Based upon the
requirements of reconstruction, data compression schemes can be di-
vided into two broad classes:
– lossless compression schemes, in which Y is identical to X;
– lossy compression schemes, which generally provide much higher
compression than lossless ones, but force Y to be different from
X.
In fact Shannon showed that the best performance achievable by a
lossless compression algorithm is to encode a stream with an average
number of bits equal to the I/N value. On the contrary lossy algorithms
do not have upper bounds to the compression ratio.
24
2.3 — Compression techniques
2.3.1 Lossless compression
Lossless compression techniques involve no loss of information. If data
have been losslessly compressed, the original data can be recovered
exactly from the compressed data. Lossless compression is generally
used for discrete data, such as text, computer-generated data and some
kind of image and video information. There are many situations that
require compression where we want the reconstruction to be identical
to the original. There are also a number of situations in which it is
possible to relax this requirement in order to get more compression: in
these cases lossy compression techniques have to be used.
2.3.2 Lossy compression
Lossy compression techniques involve some loss of information and data
that have been compressed using lossy techniques generally cannot be
recovered or reconstructed exactly. In return for accepting distortion in
the reconstruction, we can generally obtain much higher compression
ratios than it is possible with lossless compression. Whether the distor-
tion introduced is acceptable or not depends on the specific application:
for instance if the input source X contains a physical information plus
noise, while the output Y contains only the physical signal, the distor-
tion introduced is completely acceptable.
2.3.3 Measures of performance
A compression algorithm can be evaluated in a number of different
ways. We could measure the relative complexity of the algorithm, the
memory required to implement the algorithm, how fast the algorithm
performs on a given machine or on dedicated hardware, the amount of
compression and how closely the reconstruction resembles the original.
The last two features are the most important ones for our application
to SDD data.
25
Data compression techniques
A very logical way of measuring how well a compression algorithm com-
presses a given set of data is to look at the ratio of the number of bits
required to represent the data before compression to the number of bits
required to represent the data after compression. This ratio is called
compression ratio. Suppose of storing an image made up of a square
array of 256x256 8-bit pixels (exactly as a half SDD): it requires 64
KBytes. If the compressed image requires only 16 KBytes we would
then say that the compression ratio is 4.
Another way of reporting compression performance is to provide the
average number of bits required to represent a single sample. This is
generally referred to as the rate. For instance, for the same image de-
scribed above, the average number of bits per pixel in the compressed
representation is 2: thus the rate is 2 bits/pixel.
In lossy compression the reconstruction differs from the original data.
Therefore, in order to determine the efficiency of a compression algo-
rithm, we have to find some way to quantify the difference. The dif-
ference between the original data and the reconstructed ones is often
called distortion. This value is usually calculated as a mathematical or
percentual difference among data before and after compression.
2.3.4 Modelling and coding
The development of data compression algorithms for a variety of data
can be divided in two steps. The first phase is usually referred to
as modelling. In this phase we try to extract information about any
redundancy that exists in the data and describe the redundancy in the
form of a model. The second phase is called coding. The description of
the model and a description of how the data differ from the model are
encoded, generally using a binary alphabet.
26
2.4 — Lossless compression techniques
2.4 Lossless compression techniques
This section contains an explanation of the most widely used lossless
compression techniques. In particular the following items are covered:
– Huffman coding;
– run length encoding;
– differential encoding;
– dictionary techniques;
– selective readout.
Some of these algorithms have been chosen for direct application in the
1D compression algorithm implemented in the prototypes CARLOS v1
and v2.
2.4.1 Huffman coding
Huffman based compression algorithm [7] encodes data samples in this
way: symbols that occur more frequently (i.e. symbols having a higher
probability of occurrence) will have shorter codewords than symbols
that occurr less frequently. This leads to a variable-length coding
scheme, in which each symbol can be encoded with a different number
of bits. The choice of the code to assign to each symbol or, in other
words, the design of the Huffman look-up table is carried out with stan-
dard criteria.
An example can better explain this sentence. Suppose to have 5 data,
a1, a2, a3, a4 and a5, each one with a probability of occurrence, P (a1) =
0.2, P (a2) = 0.4, P (a3) = 0.2, P (a4) = 0.1, P (a5) = 0.1; at first, in
order to write down the encoding c(ai) of each data ai, it is necessary
to order data from the higher probable to the lower probable one, as
shown in Tab. 2.1.
27
Data compression techniques
Data Probability Code
a2 0.4 c(a2)
a1 0.2 c(a1)
a3 0.2 c(a3)
a4 0.1 c(a4)
a5 0.1 c(a5)
Table 2.1: Sample data and probability of occurrence
The least probable data are a4 and a5; they are assigned the following
codes:
c(a4) = α1 ∗ 0 (2.2)
c(a5) = α1 ∗ 1 (2.3)
where α1 is a generic binary string and ∗ represents the concatenation
between two strings.
If a′4 is a data for which the following relationship holds true P (a′4) =
P (a4) + P (a5) = 0.2, then data in Tab. 2.1 can be reordered from the
higher to the lower probable, as shown in Tab. 2.2.
Data Probability Code
a2 0.4 c(a2)
a1 0.2 c(a1)
a3 0.2 c(a3)
a′4 0.2 α1
Table 2.2: Introduction of data a′4
In this table lower probability data are a3 and a′4: so far they can be
encoded in the following way:
c(a3) = α2 ∗ 0 (2.4)
c(a′4) = α2 ∗ 1 (2.5)
Nevertheless, being c(a′4) = α1, from Tab. 2.2, then from (2.5) follows
28
2.4 — Lossless compression techniques
that α1 = α2 ∗ 1, e then, (2.2) and (2.3) become:
c(a4) = α2 ∗ 10 (2.6)
c(a5) = α2 ∗ 11 (2.7)
Defining a′3 as the data for which P (a′3) = P (a3) + P (a′4) = 0.4, data
from Tab. 2.2 can be reordered from the higher probable to the lower
probable as shown in Tab. 2.3.
Data Probability Code
a2 0.4 c(a2)
a′3 0.4 α2
a1 0.2 c(a1)
Table 2.3: Introduction of data a′3
In Tab. 2.3 lower probability data are a′3 and a1; so far they can be
encoded in the following way:
c(a′3) = α3 ∗ 0 (2.8)
c(a1) = α3 ∗ 1 (2.9)
Being c(a′3) = α2, from Tab. 2.3, then from (2.8) follows α2 = α3 ∗ 0,
so far (2.4), (2.6) and (2.7), become:
c(a3) = α3 ∗ 00 (2.10)
c(a4) = α3 ∗ 010 (2.11)
c(a5) = α3 ∗ 011 (2.12)
Finally, by defining a′′3 as the data for which the following relationship
holds true P (a′′3) = P (a′3) + P (a1) = 0.6, data from Tab. 2.3 can be
reordered from the higher probable to the lower probable as shown in
Tab. 2.4.
29
Data compression techniques
Data Probability Code
a′′3 0.6 α3
a2 0.4 c(a2)
Table 2.4: Introduction of data a′′3
Only two data being left, the encoding is immediate:
c(a′′3) = 0 (2.13)
c(a2) = 1 (2.14)
Beside that, being c(a′′3) = α3, as shown in Tab. 2.4, then from (2.13)
the following relationship becomes α3 = 0, i.e., (2.9), (2.10), (2.11) and
(2.12), can be written as:
c(a1) = 01 (2.15)
c(a3) = 000 (2.16)
c(a4) = 0010 (2.17)
c(a5) = 0011 (2.18)
Tab. 2.5 contains a complete view of the Huffman table so far generated.
The method used for building the Huffman table in this example can
be applied as it is to every data stream having whichever statistical
structure. Huffman codes c(ai), so far generated, can be univoquely
decoded: this means that from a sequence of variable length codes
c(ai) created using the Huffman coding, only one data sequence ai can
be reconstructed.
Beside that, as shown in the example in Tab. 2.5, none of the codes
c(ai) is contained as a prefix in the remaining codes; codes following
this property are named prefix codes. In particular prefix codes also
follow the property of being univoquely decodable, while the contrary
does not always hold true.
Finally an Huffman code is defined an optimum code since, among all
the prefix codes, it is the one that minimizes the average code length.
30
2.4 — Lossless compression techniques
Data Probability Code
a2 0.4 1
a1 0.2 01
a3 0.2 000
a4 0.1 0010
a5 0.1 0011
Table 2.5: Huffman table
2.4.2 Run Length encoding
Very often a data stream happens to contain long sequences of the
same value: this may happen when a physical quantity holds the same
value for several sampling periods, it can happen in text files where a
character can be repeated several times, it can happen in digital images
where spaces with the same color are encoded with pixels with the same
value, and so on. The compression algorithm based on the Run Length
[9] encoding is well suited for such repetitive data.
As shown in the example in Fig. 2.1, where the zero symbol has been
chosen as the repetitive data in the sequence, each zero sequence in the
original sequence is encoded as a couple of words: the first contains
the code for the zero symbol, the second contains the number of zero
symbols consecutively occurred in the original sequence.
The performances of the algorithm get better, in terms of compression
ratio, when the input data stream contains long sub-sequences of the
same symbol and when it contains few single subsequences, such as the
second code, 0→00, in Fig. 2.1. Finally this compression algorithm can
be implemented in different ways: it can be applied only on one value
of the original data sequence or on different elements of the sequence.
One of the most important applications of the Run Length encoding
system is the compression of facsimile or fax. In facsimile transmission a
page is scanned and converted into a sequence of white and black pixels:
since it is highly probable to have very long sequences of white or black
pixels, coding the lengths of runs instead of coding individual pixels
31
Data compression techniques
302345
17 8 54 0 0 0 97 5 16 0 45 23 0 0 0 0 43
17 8 54 0 2 97 5 16 0 0 43
Original sequence
Run Length
encoded sequence
Figure 2.1: Run length encoding
leads to high compression ratios. Beside that Run Length encoding is
often used in conjunction with other compression algorithms, after the
input data stream has been transformed in a more compressible form.
2.4.3 Differential encoding
Differential encoding [7] is obtained performing the difference between
one sample and the previous one, except for the first one, whose value
is left unchanged, as shown in Fig. 2.2.
It is to be noticed that each data of the original sequence can be recon-
structed by summing to the corresponding data in the coded sequence
all the previous data: for instance, 89 = 79+17+2+5+0+0+(−3)+
(−6) + (−5). So far it is very important to leave the first value in the
coded sequence unchanged, otherwise the reconstruction process can-
not be carried out correctly. The differential algorithm is well suited
for all data sequences with very small changes, in value, between con-
secutive samples: in fact for this kind of data streams the differential
encoding produces an encoded stream with a smaller dynamics, i.e. the
difference between the maximum and minimum values in the encoded
stream is smaller than the same value calculated in the original se-
quence. So far the encoded sequence can be represented with a smaller
number of bits than the original one.
32
2.4 — Lossless compression techniques
...
17 19 24 24 24 21 15 10 89 95 96 96 96 95 94 94 95
17 2 5 0 0 −3 −6 −5 79 6 1 0 0 −1 −1 0 1
Original sequence
Sequence after differential encoding
Figure 2.2: Differential encoding
Beside that the differential encoding can be used in conjunction with
the Run Length encoding system: in fact, if a sequence contains long
sequences of equal values, it is converted into a sequence of zeros by the
differential encoder and then further compressed using the Run Length
encoder.
2.4.4 Dictionary techniques
In many applications, the output of a source consists of recurring pat-
terns. A classical example is a text source in which certain patterns
or words recur frequently. Also, there are certain patterns that simply
do not occur or, if they do, occurr with great rarity. A very reason-
able approach to encoding such sources is to keep a list or dictionary
of frequently occurring patterns. When these patterns appear in the
source, they are encoded with the reference to the dictionary contain-
ing the address to the right table location. If the pattern does not
appear in the dictionary, then it can be encoded using some other,
less efficient, method. In effect we are splitting the input domain in
two classes: frequently occurring patterns and infrequently occurring
patterns. For this technique to be effective, the class of frequently oc-
curring patterns, and hence the size of the dictionary, must be much
smaller than the number of all possible patterns. Depending upon how
much information is available to build a dictionary, it can be used a
static or a dynamic approach to the creation of the dictionary. Choos-
33
Data compression techniques
ing a static dictionary technique is most appropriate when considerable
prior knowledge about the source is available.
When no a priori information is available on the structure of the input
source an adaptive technique is adopted: for example the UNIX com-
press command makes use of this technique. It starts with a dictionary
of size 512, thus transmitting codewords 9-bit long. Once the dictio-
nary has filled up, the size of the dictionary is doubled to 1024 entries,
so far transmitting codewords 10-bit long. The size of the dictionary is
progressively filled up until it contains 216 entries, then compress be-
comes a static coding technique. At this point the algorithm monitors
the compression ratio: if it falls below a threshold, the dictionary is
flushed and the dictionary building process is restarted.
The dictionary techniques are also used in the image compression field
in the GIF (Graphics Interchange Format) standard, working in a very
similar way to the compress command.
2.4.5 Selective readout
The selective readout technique [10] is a lossless data compression tech-
nique usually applied in High Energy Physics Experiments. Since really
interesting data are a small fraction of the total amount of data actu-
ally produced, it proves useful to transmit and store only those data.
The selective readout may reduce the data size by identifying regions
in space containing a significant amount of energy. For example in
the SDD case, the Central Trigger Processor (CTP) unit defines a Re-
gion Of Interest (ROI) that, event by event, contains the information
of which ladders are to be read out and which ones can be discarded.
Using the ROI feature a very high compression ratio can be achieved.
34
2.5 — Lossy compression techniques
2.5 Lossy compression techniques
This section contains an explanation of the most widely used lossy com-
pression techniques. In particular the following items will be covered:
– zero suppression;
– transform coding;
– sub-band coding with some remarks on wavelets.
The first of these algorithms has been chosen for direct application in
the 1D compression algorithm implemented in the prototypes CARLOS
v1 and v2.
2.5.1 Zero supression
Zero suppression is the very simple technique of eliminating data sam-
ples below a certain threshold, by putting them to 0. Zero suppression
proves to be very useful in data containing large quantities of zeros and
interesting data concentrated in small clusters: for instance, being the
mean occupancy of a SDD in the inner layer of 2.5 %, a compression
ratio of 40 can be obtained by using the zero suppression technique
only.
A problem arises since the SDD data and, in general, data collections
contain the sum of two different distributions: the real signal corre-
sponding to the interesting physical event and a white noise with a
Gaussian distribution around a mean value. So far if a lossy compres-
sion algorithm obtains a good compression ratio just eliminating the
noise, the distortion introduced is absolutely acceptable. The key task
for a fair implementation of the zero suppression technique is the choice
of the right value of the threshold parameter, in order to eliminate noise
while preserving the physical signal.
In the case of data coming out from the SDD detector and related
front-end electronics, data values are shifted from the 0 level to a base-
line level greater than 0. This baseline level corresponds to the mean
35
Data compression techniques
value of the noise introduced by the preamplification electronics; then
there is a spread among this value due to the RMS of the Gaussian
distribution of the noise.
The noise level introduced by the electronics may vary with time and
with the amount of radiation absorbed: so far a compression algorithm
making use of the zero suppression technique has to allow a tunable
value of the threshold level, in order to accomodate fluctuations or
drifts in the baseline values. Following this indication, the threshold
level used in CARLOS v1 and v2 is completely presettable via software
using the JTAG port.
2.5.2 Transform coding
Transform coding [7] takes as input a data sequence and transforms it
into a sequence in which most part of the information is contained into
a few samples: so far the new sequence can be further compressed using
the other compression algorithms described up to now. The key point
of transform coding is the choice of the transform: this depends on the
features and redundancies of the input data stream to compress. The
algorithm, working on N elements at a time, consists of three steps:
– transform: the input sequence sn is split in N-long sequences;
then each block is mapped, using a reversible transformation, into
the sequence cn.– quantization: the transformed sequence cn is quantized, i.e. a
number of bits is assigned to each sample depending on the dy-
namics of the sequence, compression ratio desired and acceptable
distortion.
– coding : the quantized sequence cn is encoded using a binary
encoding technique such as Run Length encoding or the Huffman
coding.
These concepts can be expressed in a mathematical way: given a se-
quence in input sn, it is divided in N-long blocks and it is mapped
36
2.5 — Lossy compression techniques
using the reversible transform A into the sequence cn:c = As (2.19)
or, in other terms:
cn =N−1∑i=0
sian,i con [A]i,j = ai,j (2.20)
Quantization and encoding steps are performed on the sequence cn,so to optimize compression.
The decompression algorithm, by means of the inverse transform B =
A−1, reconstructs the original sequence sn from the encoded sequence
cn, in the following way:
s = Bc (2.21)
or:
sn =N−1∑i=0
sibn,i con [B]i,j = bi,j (2.22)
These concepts can be easily extended to bi-dimensional data, such as
images or 2-D charge distributions, as in the case of the SDD.
Let us take a portion N × N of a digital image S, containing Si,j as
its (i, j)-th pixel; by performing a reversible bi-dimensional transform
A working on N ×N pixels at a time, with ai,j (i, j)-th element of the
transform matrix A and Ci,j (i, j)-th pixel of the block N × N of the
compressed image C, the following holds true:
Ck,l =
N−1∑i=0
N−1∑j=0
Si,jai,jak,l (2.23)
A transform is defined separable if it is possible to apply the 2D trans-
form of a N×N block by applying, first, a 1D transform on the N rows
of the block and, then, a transform on the N columns of the block, just
transformed; by choosing a separable transform the (2.23) becomes:
Ck,l =
N−1∑i=0
N−1∑j=0
Si,jak,ial,j (2.24)
37
Data compression techniques
or, expressed as a matrix:
C = ASAT (2.25)
The inverse transform is the following one:
S = BCBT (2.26)
Frequently orthonormal transforms are used, so that B = A−1 = AT ,
in a way that calculating the inverse trasform reduces to:
S = AT CA (2.27)
Even in the bi-dimensional case, in order to reach a high compression
ratio, a good transform has to be chosen. For instance the JPEG
standard has adopted, until the year 2000, the use of the Discrete
Cosine Transform, known as DCT.
If A is the matrix representing the DCT, the following relationship
follows:
[A]i,j = w(i) cos
((2j + 1)iπ
2N
)j = 0, 1, . . . , N − 1 (2.28)
where:
w(i) =
√
1N
i = 0√2N
i = 1, . . . , N − 1
Fig. 2.3 gives a graphical interpretation of (2.28).
After choosing the transform, the next step consists in the quantization
of the transformed image.
Several approaches are possible: for example the zonal mapping fore-
sees a preliminary analysis of the transformed coefficients statistics and
a later assignment of a fixed number of bits.
The name zonal mapping comes from the assignment of a fixed number
of bits depending on the zone in which each coefficient is placed in the
square N × N block under study; Tab. 2.6 reports an allocation bit
38
2.5 — Lossy compression techniques
Figure 2.3: Base coefficients for the bi-dimensional DCT in the case N = 8
8 7 5 3 1 1 0 0
7 5 3 2 1 0 0 0
4 3 2 1 1 0 0 0
3 3 2 1 1 0 0 0
2 1 1 1 0 0 0 0
1 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
Table 2.6: Allocation bit table for a 8 × 8 block
table for a 8× 8 block.
It is interesting to note that quantization in Tab. 2.6 assigns zero bits
to the coefficients in the lower-right side of the table: actually this is
equivalent to ignore these coefficients. This kind of quantization makes
sense since lower-right side coefficients come from a transformation of
the original image using high frequency cosines, i.e. these coefficients
contain an information corresponding to the high frequencies in the
original signal, see Fig. 2.3.
Since human eye response strongly depends on frequency and, in par-
ticular, it is sensible to variations at low frequencies and far less sensible
at higher frequencies, quantization in Tab. 2.6 tends to ignore informa-
tions that the human eye would not appreciate at all.
39
Data compression techniques
After quantization, only non-null coefficients are transmitted. In par-
ticular for every non-null coefficient, two words have to be transmitted:
the first with the quantized value of the coefficient itself; the second
containing the number of null samples occurred after the last non null
coefficient. This allows the decompression algorithm to exactly recon-
struct the sequence as it was quantized and, from that, the original
image.
As an example, let us suppose to have the 8 × 8 8-bit pixels image
reported in Tab. 2.7.
124 125 122 120 122 119 117 118
121 121 120 119 119 120 120 118
126 124 123 122 121 121 120 120
124 124 125 125 126 125 124 124
127 127 128 129 130 128 127 125
143 142 143 142 140 139 139 139
150 148 152 152 152 152 150 151
156 159 158 155 158 158 157 156
Table 2.7: 8× 8 block of a digital image
Each value of the block is translated of a factor 2p−1, where p is the
number of bits per pixel (in this case p = 8); then the DCT is applied
to the block obtaining the coefficients ci,j reported in Tab. 2.8.
39.88 6.56 -2.24 1.22 -0.37 -1.08 0.79 1.13
-102.43 4.56 2.26 1.12 0.35 -0.63 -1.05 -0.48
37.77 1.31 1.77 0.25 -1.50 -2.21 -0.10 0.23
-5.67 2.24 -1.32 -0.81 1.41 0.22 -0.13 0.17
-3.37 -0.74 -1.75 0.77 -0.62 -2.65 -1.30 0.76
5.98 -0.13 -0.45 -0.77 1.99 -0.26 1.46 0.00
3.97 5.52 2.39 -0.55 -0.051 -0.84 -0.52 -0.13
-3.43 0.51 -1.07 0.87 0.96 0.09 0.33 0.01
Table 2.8: DCT coefficients related to the block in Tab. 2.7.
40
2.5 — Lossy compression techniques
As already stated high-frequency related coefficients in the lower-right
corner tend to be quite close to 0, while most of the information is
concentrated in the upper-left corner.
The quantization of the coefficients is obtained using the reference ta-
ble as in Tab. 2.9; in particular quantized lij values are obtained with
the following formula:
lij =
⌊(cijQt
ij
+ 0.5
)⌋(2.29)
where Qtij is the (i,j)-th element of the quantization table and bc is a
function for which bxc is the greatest integer less than x.
16 11 10 16 24 40 51 61
12 12 14 19 26 58 60 55
14 13 16 24 40 57 69 56
14 17 22 29 51 87 80 62
18 22 37 56 68 109 103 77
24 35 55 64 81 104 113 92
49 64 78 87 103 121 120 101
72 92 95 98 112 100 103 99
Table 2.9: Quantization table
Tab. 2.10 contains the resulting bit allocation table obtained using the
values contained in the quantization table Tab. 2.9:
After studying the structure of matrices like Tab. 2.10, the order chosen
for sending coefficients is the one shown in Fig. 2.4.
This choice allows to have a high probability that the final sequence
contains a lot of zero coefficients; so far this part of the sequence can
be encoded using the Run-Length technique.
2.5.3 Subband coding
A signal can be decomposed in different frequency components (see
Fig. 2.5) using analog or digital filters, then each resulting signal can
41
Data compression techniques
2 1 0 0 0 0 0 0
-9 0 0 0 0 0 0 0
3 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
Table 2.10: Resulting bit allocation table
Figure 2.4: Zig-zag scanning pattern for an 8x8 transform
be encoded and compressed using a specific algorithm. Digital filtering
[9] involves taking a weighted sum of current and past inputs to the
filter and, in some cases, the past outputs to the filter. The general
form of the input-output relationship of the filter is given by:
yn =
N∑i=0
aixn−i +
M∑i=1
biyn−i (2.30)
where the sequence xn is the input to the filter, the sequence yn is
the output from the filter and the values ai and bi are called the filter
coefficients. If the input sequence is a single 1 followed by all 0s, the
output sequence is called the impulse response of the filter. The im-
42
2.5 — Lossy compression techniques
input signal
Figure 2.5: Decomposition of a signal in frequency components
pulse response completely specifies the filter: once we know the impulse
response of the filter, we know the relationship between the input and
the output of the filter. Notice that if the bi are all zero, there the
impulse response will die out after N samples. These filters are called
finite impulse response or FIR filters. In FIR filters Eq. 2.30 reduces
to a convolution operation between the input signal and the filter co-
efficients. Filters with the nonzero values for some of the bi are called
infinite response filters or IIR filters.
The basic subband coding works as follows: the source is passed
through a bank of filters (a 3-level filter bank is shown in Fig. 2.6),
called the analysis filter bank which covers the range of frequencies
that make up the source; the outputs of the filters are then subsampled
as in Fig. 2.7. The justification of subsampling is the Nyquist rule and
its generalization, which tells that for perfect reconstruction we only
need twice as many samples per second as the range of frequencies.
This means that it is possible to reduce the number of samples at the
output of the filter as the range of frequencies is less than the range of
frequencies at the input of the filter. The process of reducing the num-
ber of samples is called decimation or downsampling. The amount of
decimation depends on the ratio of the bandwidth of the filter output
43
Data compression techniques
Low pass filter
High pass filter
Low pass filter
High pass filter
Low pass filter
High pass filter
Low pass filter
High pass filter
High pass filter
Low pass filter
Low pass filter
High pass filter
High pass filter
Low pass filter
Figure 2.6: An 8-band 3-level filter bank
to the filter input. If the bandwidth at the output of the filter is 1/M
of the bandwidth at the input of the filter, the output is decimated by
a factor of M by keeping every Mth sample. Once the output of the
filters has been decimated, the output is encoded using one of several
encoding schemes explained so far.
Along with the selection of the compression scheme, the allocation of
bits between the subbands is an important design parameter, since
different subbands contain differing amounts of information. The bit
allocation procedure can have a significant impact on the quality of
the final reconstruction, especially when the information component of
different bands is very different.
The decompression phase, in subband coding also named synthesis,
works as follows: first the encoded samples for each subband are de-
coded at the receiver, then the decoded values are upsampled by in-
serting an appropriate number of zeros between the samples, then the
upsampled signals are passed through a bank of reconstruction filters
and added together.
44
2.5 — Lossy compression techniques
input
G
H~
~
signal
2
2
ν
ν
Analysis filter 1
Analysis filter 2Downsampling
Downsampling
Encoder 2
Encoder 1
Figure 2.7: Subband coding technique: analysis filter bank, downsam-
pling and encoding
Subband coding has applications in speech coding and audio coding
with the MPEG audio, but can be applied also to image compression.
2.5.4 Wavelets
Another method of decomposing signals that has gained a great deal
of popularity in recent years is the use of wavelets [11, 12, 13, 14].
Decomposing a signal in terms of its frequency content using sinusoids
results in a very fine resolution in the frequency domain. However
siinusoids are defined on the time domain from −∞ to ∞, therefore
individual frequency components give no temporal resolution [15].
In a wavelet representation, a signal is represented in terms of functions
that are localized both in time and in frequency. For instance, the
following is known as the Haar wavelet :
ψ0,0(x) =
1 0 ≤ x < 1
2
−1 12≤ x < 1
(2.31)
45
Data compression techniques
ψ0,0
2,22,12,0
1,0 1,1
ψψψ
ψψ
Figure 2.8: The Haar wavelet
From this “mother” function the following set of functions can be ob-
tained:
ψj,k(x) = ψ0,0(2jx− k) =
1 k2−j ≤ x < (k + 1
2)2−j
−1 (k + 12)2−j ≤ x < (k + 1)2−j
(2.32)
A few of these functions are shown in Fig. 2.8. It is to be noticed that
as j increases, the functions become more and more localized in time.
This localization action in known as dilation and it allows to represent
local changes accurately using very few coefficients. The effect of k in
Equation (2.32) is to move or translate the wavelet. The components
of a wavelet expansion are obtained from a mother wavelet through the
actions of dilation and translation. In Equation (2.32) any real num-
bers a and b could be used insetad of the dilation factor 2j and the
translation factor k, but this is the usual choice for the discrete wavelet
representation.
The wavelet representation may provide a better approximation to the
input with fewer coefficients; however its usefulness depends to a large
extent on the ease of implementation.
46
2.5 — Lossy compression techniques
(a) (b)
(c) (d)
Figure 2.9: Example of multiresolution analysis
In 1989, Stephane Mallat ([16]) developed the multiresolution approach,
which moved the representation using wavelets into the domain of sub-
band coding. These concepts can be better understood with the help of
an example. Let us suppose we have to approximate the function f(t)
drawn in Fig. 2.9a using the translated versions of some time-limited
function φ(t). The indicator function is a simple approximating func-
tion:
φ(t) =
1 0 ≤ t < 1
0 otherwise(2.33)
Let us now define the translated versions of φ(t), φ0,k(t):
φ0,k(t) = φ(t− k) (2.34)
Then it is possible to approximate the waveform f(t) by a linear com-
bination of the φ(t) and its translates as shown in Fig. 2.9b:
φ0f(t) =
N−1∑k=0
c0,kφ0,k (2.35)
where
φ0,0(t) = φ(t) (2.36)
47
Data compression techniques
and c0,k are the average values of the function in the interval [k− 1, k).
In other words:
c0,k =
∫ k+1
k
f(t)φ0,k(t)dt (2.37)
It is possible to scale φ(t) to obtain:
φ1,0(t) = φ0,0(2t) =
1 0 ≤ t < 1
2
0 otherwise(2.38)
Its translates would be given by:
φ1,k(t) = φ1,0(t− k) (2.39)
= φ0,0(2t− k) = (2.40)1 0 ≤ 2t− k < 1
0 otherwise(2.41)
1 k2≤ t < (k+1)
2
0 otherwise(2.42)
Approximating the function f(t) using the translates φ1,0(t), the ap-
proximation φ1f(t) is obtained as shown in Fig. 2.9c:
φ1f(t) =
2N−1∑k=0
c1,kφ1,k (2.43)
where:
c1,k = 2
∫ (k+1)/2
k/2
f(t)φ0,k(t)dt (2.44)
In this case we need twice as many coefficients compared to the previous
case. The two sets of coefficients are related by:
c0,k =1
2(c1,2k + c1,2k+1) (2.45)
If we wanted to get a closer approximation we could obtain a further
scaled version of φ(t) and so on until we obtain an accurate represen-
tation of f(t) (see Fig. 2.9d). Now let us assume that the function f(t)
48
2.5 — Lossy compression techniques
is accurately represented by φ1f(t). φ1
f(t) can be decomposed into a
lower resolution version of itself, namely φ0f(t) and the difference φ1
f (t)
- φ0f(t). Let us examine this function over an arbitrary interval [k,k+1):
φ1f(t)− φ0
f(t) =
c0,k − c1,2k k ≤ t < k + 1
2
c0,k − c1,2k+1 k + 12≤ t < k + 1
(2.46)
Substituting for c0,k from (2.45) we obtain:
φ1f(t)− φ0
f(t) =
−1
2c1,2k + 1
2c1,2k+1 k ≤ t < k + 1
212c1,2k − 1
2c1,2k+1 k + 1
2≤ t < k + 1
(2.47)
Defining:
b0,k = −c1,2k + c1,2k+1 (2.48)
over the interval [k,k+1), φ1f(t) - φ0
f(t) can be written as:
φ1f(t)− φ0
f(t) = b0,kψ0,k(t) (2.49)
where ψ0,k(t) is the kth translate of the ψ0,0(t) and:
ψ0,0(t) =
1 0 ≤ t < 1
2
−1 12≤ t < 1
(2.50)
which is the Haar wavelet. The 2N-point sequence c1,k can be decom-
posed into two N-point sequences c0,k and b0,k. The first decomposition
is performed using a set of functions called the scaling function: this
decomposition produces the approximation coefficients. The second
component is obtained in terms of a wavelet and its translates: this
decomposition produces the detail coefficients.
This example can be generalized as follows. Let φj,k(t) be a set of
functions with the following properties:
1.
φj,0(t) = φ0,0(2jt) (2.51)
49
Data compression techniques
2. If a function can be expressed exactly by a linear combination of
the set φj,k(t), then it can also be expressed exactly as a function
of the set φl,k(t) for all l ≥ j.
3. The complete set φj,k(t)∞j,k=−∞ can exactly represent all func-
tions with the property that:∫ ∞
−∞|f(t)|2 <∞ (2.52)
4. If a function f(t) can be exactly represented by the set φ0,k(t),then any integer translate of the function f(t − k) can also be
represented exactly by the same set.
5. ∫φ0,l(t)φ0,k(t)dt =
0 l 6= k
1 l = k(2.53)
The set forms a multiresolution analysis [16]. So far at any resolution
2−j every function f(t) can be decomposed in two components: one
that can be expressed as a function of the set φj,k(t) and one that
can be expressed as a linear combination of the wavelets ψj,k(t).The mother wavelet ψ0,0(t) and the scaling function φ0,0(t) are related
in the following manner: from Property 2, φ0,0 can be written in terms
of φ1,k. If the relationship is given by:
φ0,0(t) =∑
hnφ1,n(t) (2.54)
Then the wavelet ψ0,0(t) is given by:
ψ0,0(t) =∑
(−1)nhnφ1,n(t) (2.55)
From this relationship we can assume that the wavelet decomposition
can be implemented in terms of filters with impulse responses given
by (2.54) and (2.55) and that the filters are quadrature mirror filters.
Most of the orthonormal wavelets are nonzero over an infinite inter-
val. Therefore the corresponding filters are IIR filters. Well known
50
2.6 — Implementation of compression algorithms
exceptions are the Daubechies wavelets that correspond to FIR filters.
Once obtained the coefficients of the FIR filters, the procedure for com-
pression using wavelets is identical to the one described for subband
coding. From now on the terms multiresolution analysis and wavelet-
based analysis will be regarded as synonymous. Some of the most used
wavelets families are shown in Fig. 2.10, Fig. 2.11 and Fig. 2.12.
2.6 Implementation of compression algo-
rithms
Compression algorithms can be implemented in hardware or in soft-
ware, depending on the required speed. When speed is the most impor-
tant constraint on the choice of the implementation of the compression
algorithm, hardware implementation becomes necessary.
Commercial devices exist implementing data compression in hardware:
for example the ALDC1-40S-M from IBM featuring an adaptive lossless
data compression works at a rate of 40 MBytes/s, while the AHA32321
chip from Aha can compress and decompress data at 10 MBytes/s with
a clock frequency of 40 MHz. These rates are far too small than the
one required for what concerns the SDD readout: in fact the com-
pression chip we need has to face an input data rate of 320 MByte/s.
No commercial chip exists with such features, so we had to design an
Application Specific Integrated Circuit (ASIC) targeted to our require-
ments.
51
Data compression techniques
00.2
0.40.6
0.81
00.20.40.60.81Sc
aling
func
tion p
hi
00.2
0.40.6
0.81
−1−0.500.51
Wav
elet fu
nctio
n psi
01
−0.500.5
Deco
mpos
ition l
ow−p
ass f
ilter
01
−0.500.5
Reco
nstru
ction
low−
pass
filter
01
−0.500.5
Deco
mpos
ition h
igh−p
ass f
ilter
01
−0.500.5
Reco
nstru
ction
high
−pas
s filte
r
00.5
11.5
22.5
3
00.51
Scali
ng fu
nctio
n phi
00.5
11.5
22.5
3
−1−0.500.511.5
Wav
elet fu
nctio
n psi
01
23
−0.4
−0.200.20.40.60.8
Deco
mpos
ition l
ow−p
ass f
ilter
01
23
−0.4
−0.200.20.40.60.8
Reco
nstru
ction
low−
pass
filter
01
23
−0.4
−0.200.20.40.60.8
Deco
mpos
ition h
igh−p
ass f
ilter
01
23
−0.4
−0.200.20.40.60.8
Reco
nstru
ction
high
−pas
s filte
r
01
23
45
00.51
Scali
ng fu
nctio
n phi
01
23
45
−1−0.500.511.5
Wav
elet fu
nctio
n psi
01
23
45
−0.4
−0.200.20.40.60.8
Deco
mpos
ition l
ow−p
ass f
ilter
01
23
45
−0.4
−0.200.20.40.60.8
Reco
nstru
ction
low−
pass
filter
01
23
45
−0.4
−0.200.20.40.60.8
Deco
mpos
ition h
igh−p
ass f
ilter
01
23
45
−0.4
−0.200.20.40.60.8
Reco
nstru
ction
high
−pas
s filte
r
05
1015
−0.4
−0.200.20.40.60.8
Scali
ng fu
nctio
n phi
05
1015
−1−0.500.5
Wav
elet fu
nctio
n psi
0 2
4 6
810
1214
1618
−0.500.5
Deco
mpos
ition l
ow−p
ass f
ilter
0 2
4 6
810
1214
1618
−0.500.5
Reco
nstru
ction
low−
pass
filter
0 2
4 6
810
1214
1618
−0.500.5
Deco
mpos
ition h
igh−p
ass f
ilter
0 2
4 6
810
1214
1618
−0.500.5
Reco
nstru
ction
high
−pas
s filte
r
00.2
0.40.6
0.81
00.20.40.60.81Sc
aling
func
tion p
hi
00.2
0.40.6
0.81
−1−0.500.51
Wav
elet fu
nctio
n psi
01
−0.500.5
Deco
mpos
ition l
ow−p
ass f
ilter
01
−0.500.5
Reco
nstru
ction
low−
pass
filter
01
−0.500.5
Deco
mpos
ition h
igh−p
ass f
ilter
01
−0.500.5
Reco
nstru
ction
high
−pas
s filte
r
Dau
bach
ies
db1
db2
db3
db10
haar
Haa
r
Figure 2.10: Some functions belonging to different wavelet families: note
that db1 is equivalent to the Haar
52
2.6 — Implementation of compression algorithms
00.5
11.5
22.5
3
00.51
Scali
ng fu
nctio
n phi
00.5
11.5
22.5
3
−1.5−1−0.500.51
Wav
elet fu
nctio
n psi
01
23
−0.4
−0.200.20.40.60.8
Deco
mpos
ition l
ow−p
ass f
ilter
01
23
−0.4
−0.200.20.40.60.8
Reco
nstru
ction
low−
pass
filter
01
23
−0.4
−0.200.20.40.60.8
Deco
mpos
ition h
igh−p
ass f
ilter
01
23
−0.4
−0.200.20.40.60.8
Reco
nstru
ction
high
−pas
s filte
r
01
23
45
00.51
Scali
ng fu
nctio
n phi
01
23
45
−1.5−1−0.500.51
Wav
elet fu
nctio
n psi
01
23
45
−0.4
−0.200.20.40.60.8
Deco
mpos
ition l
ow−p
ass f
ilter
01
23
45
−0.4
−0.200.20.40.60.8
Reco
nstru
ction
low−
pass
filter
01
23
45
−0.4
−0.200.20.40.60.8
Deco
mpos
ition h
igh−p
ass f
ilter
01
23
45
−0.4
−0.200.20.40.60.8
Reco
nstru
ction
high
−pas
s filte
r
01
23
45
67
−0.200.20.40.60.811.2
Scali
ng fu
nctio
n phi
01
23
45
67
−1−0.500.511.5
Wav
elet fu
nctio
n psi
01
23
45
67
−0.500.5
Deco
mpos
ition l
ow−p
ass f
ilter
01
23
45
67
−0.500.5
Reco
nstru
ction
low−
pass
filter
01
23
45
67
−0.500.5
Deco
mpos
ition h
igh−p
ass f
ilter
01
23
45
67
−0.500.5
Reco
nstru
ction
high
−pas
s filte
r
05
1015
−0.200.20.40.60.81
Scali
ng fu
nctio
n phi
05
1015
−0.500.51
Wav
elet fu
nctio
n psi
0 2
4 6
810
1214
−0.500.5
Deco
mpos
ition l
ow−p
ass f
ilter
0 2
4 6
810
1214
−0.500.5
Reco
nstru
ction
low−
pass
filter
0 2
4 6
810
1214
−0.500.5
Deco
mpos
ition h
igh−p
ass f
ilter
0 2
4 6
810
1214
−0.500.5
Reco
nstru
ction
high
−pas
s filte
r
01
23
45
00.511.5
Scali
ng fu
nctio
n phi
01
23
45
−1−0.500.511.52
Wav
elet fu
nctio
n psi
01
23
45
−0.500.5
Deco
mpos
ition l
ow−p
ass f
ilter
01
23
45
−0.500.5
Reco
nstru
ction
low−
pass
filter
01
23
45
−0.500.5
Deco
mpos
ition h
igh−p
ass f
ilter
01
23
45
−0.500.5
Reco
nstru
ction
high
−pas
s filte
r
02
46
810
−0.200.20.40.60.811.2
Scali
ng fu
nctio
n phi
02
46
810
−0.500.511.5
Wav
elet fu
nctio
n psi
0 2
4 6
810
−0.500.5
Deco
mpos
ition l
ow−p
ass f
ilter
0 2
4 6
810
−0.500.5
Reco
nstru
ction
low−
pass
filter
0 2
4 6
810
−0.500.5
Deco
mpos
ition h
igh−p
ass f
ilter
0 2
4 6
810
−0.500.5
Reco
nstru
ction
high
−pas
s filte
r
05
1015
−0.200.20.40.60.81
Scali
ng fu
nctio
n phi
05
1015
−0.500.51
Wav
elet fu
nctio
n psi
0 2
4 6
810
1214
16
−0.500.5
Deco
mpos
ition l
ow−p
ass f
ilter
0 2
4 6
810
1214
16
−0.500.5
Reco
nstru
ction
low−
pass
filter
0 2
4 6
810
1214
16
−0.500.5
Deco
mpos
ition h
igh−p
ass f
ilter
0 2
4 6
810
1214
16
−0.500.5
Reco
nstru
ction
high
−pas
s filte
r
05
1015
2025
−0.200.20.40.60.81
Scali
ng fu
nctio
n phi
05
1015
2025
−0.500.51
Wav
elet fu
nctio
n psi
0 4
812
1620
2428
−0.500.5
Deco
mpos
ition l
ow−p
ass f
ilter
0 4
812
1620
2428
−0.500.5
Reco
nstru
ction
low−
pass
filter
0 4
812
1620
2428
−0.500.5
Deco
mpos
ition h
igh−p
ass f
ilter
0 4
812
1620
2428
−0.500.5
Reco
nstru
ction
high
−pas
s filte
r
sym
2sy
m3
sym
4sy
m8
Coi
flet
s
coif
1co
if2
coif
3co
if5
Sym
lets
Figure 2.11: Some functions belonging to different wavelet families
53
Data compression techniques
00.2
0.40.6
0.81
00.51De
comp
ositio
n sca
ling f
uncti
on ph
i
00.2
0.40.6
0.81
00.51Re
cons
tructi
on sc
aling
func
tion p
hi
00.2
0.40.6
0.81
−1−0.500.51
Deco
mpos
ition w
avele
t func
tion p
si
00.2
0.40.6
0.81
−1−0.500.51
Reco
nstru
ction
wav
elet fu
nctio
n psi
01
−0.500.5
Deco
mpos
ition l
ow−p
ass f
ilter
01
−0.500.5
Reco
nstru
ction
low−
pass
filter
01
−0.500.5
Deco
mpos
ition h
igh−p
ass f
ilter
01
−0.500.5
Reco
nstru
ction
high
−pas
s filte
r
01
23
4
00.51
Deco
mpos
ition s
calin
g fun
ction
phi
01
23
400.51
Reco
nstru
ction
scali
ng fu
nctio
n phi
01
23
4
−1−0.500.51
Deco
mpos
ition w
avele
t func
tion p
si
01
23
4−1−0.500.51
Reco
nstru
ction
wav
elet fu
nctio
n psi
01
23
45
−0.500.5
Deco
mpos
ition l
ow−p
ass f
ilter
01
23
45
−0.500.5
Reco
nstru
ction
low−
pass
filter
01
23
45
−0.500.5
Deco
mpos
ition h
igh−p
ass f
ilter
01
23
45
−0.500.5
Reco
nstru
ction
high
−pas
s filte
r
02
46
8
00.51
Deco
mpos
ition s
calin
g fun
ction
phi
02
46
800.51
Reco
nstru
ction
scali
ng fu
nctio
n phi
02
46
8
−1−0.500.51
Deco
mpos
ition w
avele
t func
tion p
si
02
46
8−1−0.500.51
Reco
nstru
ction
wav
elet fu
nctio
n psi
0 1
2 3
4 5
6 7
8 9
−0.500.5
Deco
mpos
ition l
ow−p
ass f
ilter
0 1
2 3
4 5
6 7
8 9
−0.500.5
Reco
nstru
ction
low−
pass
filter
0 1
2 3
4 5
6 7
8 9
−0.500.5
Deco
mpos
ition h
igh−p
ass f
ilter
0 1
2 3
4 5
6 7
8 9
−0.500.5
Reco
nstru
ction
high
−pas
s filte
r
05
1015
00.51
Deco
mpos
ition s
calin
g fun
ction
phi
05
1015
00.51
Reco
nstru
ction
scali
ng fu
nctio
n phi
05
1015
−0.500.511.5
Deco
mpos
ition w
avele
t func
tion p
si
05
1015
−0.500.51
Reco
nstru
ction
wav
elet fu
nctio
n psi
0 2
4 6
810
1214
16
−0.500.5
Deco
mpos
ition l
ow−p
ass f
ilter
0 2
4 6
810
1214
16
−0.500.5
Reco
nstru
ction
low−
pass
filter
0 2
4 6
810
1214
16
−0.500.5
Deco
mpos
ition h
igh−p
ass f
ilter
0 2
4 6
810
1214
16
−0.500.5
Reco
nstru
ction
high
−pas
s filte
r
00.2
0.40.6
0.81
00.51De
comp
ositio
n sca
ling f
uncti
on ph
i
00.2
0.40.6
0.81
00.51Re
cons
tructi
on sc
aling
func
tion p
hi
00.2
0.40.6
0.81
−1−0.500.51
Deco
mpos
ition w
avele
t func
tion p
si
00.2
0.40.6
0.81
−1−0.500.51
Reco
nstru
ction
wav
elet fu
nctio
n psi
01
−0.500.5
Deco
mpos
ition l
ow−p
ass f
ilter
01
−0.500.5
Reco
nstru
ction
low−
pass
filter
01
−0.500.5
Deco
mpos
ition h
igh−p
ass f
ilter
01
−0.500.5
Reco
nstru
ction
high
−pas
s filte
r
01
23
400.51
Deco
mpos
ition s
calin
g fun
ction
phi
01
23
4
00.51
Reco
nstru
ction
scali
ng fu
nctio
n phi
01
23
4−1−0.500.51
Deco
mpos
ition w
avele
t func
tion p
si
01
23
4
−1−0.500.51
Reco
nstru
ction
wav
elet fu
nctio
n psi
01
23
45
−0.500.5
Deco
mpos
ition l
ow−p
ass f
ilter
01
23
45
−0.500.5
Reco
nstru
ction
low−
pass
filter
01
23
45
−0.500.5
Deco
mpos
ition h
igh−p
ass f
ilter
01
23
45
−0.500.5
Reco
nstru
ction
high
−pas
s filte
r
05
1015
00.51
Deco
mpos
ition s
calin
g fun
ction
phi
05
1015
00.51
Reco
nstru
ction
scali
ng fu
nctio
n phi
05
1015
−0.500.51
Deco
mpos
ition w
avele
t func
tion p
si
05
1015
−0.500.511.5
Reco
nstru
ction
wav
elet fu
nctio
n psi
0 2
4 6
810
1214
16
−0.500.5
Deco
mpos
ition l
ow−p
ass f
ilter
0 2
4 6
810
1214
16
−0.500.5
Reco
nstru
ction
low−
pass
filter
0 2
4 6
810
1214
16
−0.500.5
Deco
mpos
ition h
igh−p
ass f
ilter
0 2
4 6
810
1214
16
−0.500.5
Reco
nstru
ction
high
−pas
s filte
r
02
46
800.51
Deco
mpos
ition s
calin
g fun
ction
phi
02
46
8
00.51
Reco
nstru
ction
scali
ng fu
nctio
n phi
02
46
8−1−0.500.51
Deco
mpos
ition w
avele
t func
tion p
si
02
46
8
−1−0.500.51
Reco
nstru
ction
wav
elet fu
nctio
n psi
0 1
2 3
4 5
6 7
8 9
−0.500.5
Deco
mpos
ition l
ow−p
ass f
ilter
0 1
2 3
4 5
6 7
8 9
−0.500.5
Reco
nstru
ction
low−
pass
filter
0 1
2 3
4 5
6 7
8 9
−0.500.5
Deco
mpos
ition h
igh−p
ass f
ilter
0 1
2 3
4 5
6 7
8 9
−0.500.5
Reco
nstru
ction
high
−pas
s filte
r
bior
1.1
bior
1.3
bior
1.5
bior
6.8
Bio
rtho
gona
l Wav
elet
s
rbio
1.1
rbio
1.3
rbio
1.5
rbio
6.8
Rev
erse
Bio
rtho
gona
l Wav
elet
s
Figure 2.12: Some functions belonging to different wavelet families: note
that bior1.1 and rbior1.1 are equivalent to the haar
54
Chapter 3
1D compression algorithm
and implementations
3.1 Compression algorithms for SDD
The choice of the algorithm for SDD data compression is strictly related
to the input data stream features:
– low detector occupancy (max 3 %)
– small samples are much more probable than high samples
The first feature suggests the use of a zero suppression algorithm: all
samples below a certain value (depending on the noise distribution)
are discarded. The second feature suggests to adopt an entropy coder,
such as the Huffman one. Beside that it is important for the algorithm
to contain software tunable parameters in order to re-optimize the al-
gorithm performance in case of changes on the statistics of the input
distribution. For instance the threshold level has to be changeable via
software in order to take into account of changes on the signal to noise
ratio over the years, so the Huffman tables have to be reconfigurable
too. The other important features for the compression algorithms are:
– they have to be fast
55
1D compression algorithm and implementations
– they have to be simple to implement in hardware
– they have to allow lossless data transmission
For the development of the compression algorithms, studies have been
performed on the statistical distribution of the sample data coming
from the single-particle events of three beam tests, so that noise could
be properly taken into account. The compression results have been
evaluated in order to verify the algorithm efficiency and the best pa-
rameter values.
3.2 1D compression algorithm
Following these requirements the INFN Section of Torino has chosen
a sequential compression algorithm [17] which scans data coming from
each anode row as uni-dimensional data streams. As shown in Fig. 3.1
as an example, data samples coming from anode 76 are processed, then
from anode 77 and so on. The ultimate goal of the algorithm is to
save data belonging to a cluster, while rejecting all the other samples
regarded as noise. To have a data reduction system that is applicable
to all the situations, the algorithm is provided with different tuning
parameters (Fig. 3.2 provides a graphical explanation of them):
– threshold : the threshold parameter is applied to the incoming
samples, forcing the differences to zero if they are smaller than
this value. This parameter has the goal of eliminating noise and
pedestals affecting data.
– tolerance: the tolerance parameter is applied to differences calcu-
lated between consecutive samples, forcing them to zero if they
are less than this value (using this mechanism samples not very
different are considered equal). So far non significant fluctuations
of the input values are eliminated using the tolerance mechanism.
– disable: the disable parameter is applied to the input data, re-
moving all previous mechanisms for samples greater than disable
56
3.2 — 1D compression algorithm
Figure 3.1: Cluster in two dimensions and its slices along the anode di-
rection
in order to have full information on the clusters and to maintain
good double peak resolution. This means that the important in-
formation is not affected by the lossy compression algorithm.
The 1D algorithm actually consists of 5 processing steps sequentially
applied (see Fig. 3.3):
– first the input data values below the threshold parameter value
are put to 0;
– then, the difference between a sample and the previous one (along
the time direction) is calculated;
– if the difference value is smaller than the tolerance parameter and
if the input sample is smaller the the disable parameter, then the
difference value is put to 0, otherwise its value is left unchanged;
– these values are then encoded using the Huffman table;
– the obtained values are then encoded using the Run Length en-
coding method.
57
1D compression algorithm and implementations
disable
−tolerance
+tolerance
threshold
time
an
od
ic s
ign
al
Figure 3.2: Threshold, tolerance and disable parameters
The high probability of finding long zero sequences in the SDD charge
distribution makes the Run Length encoding use very effective, espe-
cially when combined with threshold, tolerance and disable mechanisms.
3.3 1D algorithm performances
As explained in Chapter 1 in order to comply with the target figures of
DAQ speed and magnetic tape usage, the size of the SDD event has to
be reduced from 32.5 MBytes to about 1.5 MBytes, which corresponds
to a target compression coefficient of 22. Several standard compression
algorithms have been evaluated on SDD test beam events data in order
to have an estimation of the compression performances achievable: the
best compression coefficient has been obtained with the gzip utility
implemented in the Unix operating system, so far it was chosen for
comparison with our 1D algorithm. The data was submitted to the
gzip program into a binary format for a fair comparison.
58
3.3 — 1D algorithm performances
simple threshold zero suppression
differential encoding
tolerance
Huffman encoding
run length encoding
threshold
Huffman tables
tolerance
software tunable parameters
compressed data
input stream
Figure 3.3: 1D compression algorithms
3.3.1 Compression coefficient
For the comparison task data coming from the August 1998 test beam
was chosen. The gzip compression algorithm achieves a compression
ratio around 2: this value is too far from our target value of 22.
The 1D compression algorithm has been applied using a threshold value
of 20 = 1 ∗ noise mean + 1.35 ∗ noise RMS and tolerance = 0: the
compression value obtained is around 12.5. This is still an unaccept-
able value for our purposes. The goal compression value of 22 can
only be reached by increasing the threshold parameter, which implies a
larger information loss. For instance by applying the algorithm on the
same test beam data it is possible to obtain a compression coefficient
of about 33, with threshold = 40 = 1∗noise mean+2.68∗noise RMS
and tolerance = 0. Fig. 3.4 shows the variation of the compression
coefficient using the 1D algorithm as a function of the threshold level
between 20 and 40 and for two values of tolerance.
59
1D compression algorithm and implementations
Figure 3.4: 1D compression ratio as a function of threshold and tolerance
An important feature of this compression algorithm is that it can be
reversed to a lossless algorithm simply by putting the values of thresh-
old and tolerance to 0. Sending data without losing any information
will be very useful for the first event acquisitions since raw data will be
analyzed for determing statistics, noise and so on. These raw data will
also be used for determining the best Huffman tables, the ones allowing
to obtain the best compression coefficient. When used in lossless mode,
meaning that only differential encoding, Huffman and run length en-
coding are applied, the compression coefficient obtained is 2.3, that is
even better than what we obtain with the gzip algorithm.
3.3.2 Reconstruction error
So far it was to be checked if the information loss introduced with a
threshold level of 40 is acceptable or not. In particular it was decided to
study how much data compression and decompression affected clusters
geometry for what concerns centroid position and charge.
A cluster finding routine was developed with the following two step
procedure:
60
3.3 — 1D algorithm performances
Figure 3.5: Spreads introduced by data compression on measurement of
coordinates of the SDD clusters and of the cluster charge (bottom right)
– data streams are analyzed one anode row after the other: when
a sample value is higher than a certain threshold level for two
consecutive time bins, it is considered to be a hit until it goes
below the same threshold for two consecutive time bins;
– then if any two 1-D hits from adjacent anodes overlap in time they
are considered as a part of a two-dimensional cluster.
After finding samples belonging to clusters they are fitted with a two-
dimensional Gaussian function, with the following features:
– the mean value corresponds to the cluster centroid;
– the sigma value corresponds to the centroid resolution;
– the volume under the Gaussian function corresponds to the charge
released on the detector by the ionizing particle.
61
1D compression algorithm and implementations
1D compression and decompression algorithms were then applied on
test beam data, performed cluster finding and analysis on both data:
the results are shown in Fig. 3.5. The picture on the upper left shows
the distribution of the differences in the centroid coordinates before
and after compression along the anode and drift time direction. The
picture on the upper right shows the same distribution on the drift time
direction, while the picture on the bottom left shows the distribution
along the anode direction. These plots show that the compression algo-
rithm with a threshold of 40 does not introduce biases on the centroid
coordinate measurements, but that worsen their accuracy by about 9
µm (+4%) along the anode direction and by about 16 µm (+8%) along
the drift time axis. The bottom right picture shows the percentual dif-
ference of charge before and after compression: so far the 1D algorithm
also introduces an underestimation of the cluster charge of about 4 %.
3.4 CARLOS v1
During 1999 I have collaborated with INFN group in Torino for the
design and test of a first hardware implementation of the 1D algorithm:
CARLOS v1. This device is physically implemented as a PCB (Printed
Circuit Board) containing 2 FPGAs (Field Programmable Gate Array)
circuits and some connectors for use in a test beam data acquisition
system, as shown in Fig. 3.6. The device processes data coming from
one macrochannel only, that is data coming from one half-detector, and
directly interfaces the SIU board, the first stage of the DAQ system.
3.4.1 Board description
The main two processing blocks mounted on the board are the two
Xilinx FPGA devices. An FPGA is a completely programmable de-
vice widely used for fast prototyping before the final implementation of
the design on an ASIC circuit which requires more resources as far as
62
3.4 — CARLOS v1
Figure 3.6: CARLOS prototype v1 picture
time, money and design efforts. An FPGA contains a matrix of CLBs
(Configurable Logic Blocks) that can be individually programmed and
connected together in order to implement the desired input/output
logic function. Each CLB contains a SRAM (Static RAM) that is used
to implement a logic function by putting the input values on the ad-
dress bus: they are used as look-up tables.
An other piece of silicon area on the FPGA die contains the con-
figuration RAM : depending on the contents of this block the device
will accomplish different logic functions. The configuration RAM is
written on power-on from an external EPROM: CARLOS v1 hosts two
EPROM devices for the configuration of the two FPGAs. The pro-
cess of configuration takes around 20 ms, after which the devices are
completely operational. A 10 MHz clock generator is hosted between
the EPROM chips: we could not achieve a higher working frequency
with our choice of FPGA device. In fact the final operating frequency
63
1D compression algorithm and implementations
Features Values
Logic cells 2432
Max logic gates (no RAM) 25k
Max RAM bits (no logic) 32768
Typical gate range (logic and RAM) 15k - 45k
CLB matrix 32x32
Total CLBs 1024
Number of flip-flops 2560
Number of user I/O 256
Table 3.1: XC4025 Xilinx FPGA main features
is a function of how many internal resources are being used: the more
resources are used, the slower becomes the final working frequency.
With the final 10 MHz frequency we reached a good trade-off between
logic complexity and speed; furthermore this frequency was sufficient
for application in a test-beam environment. Tab. 3.1 reports the main
features of the chosen FPGA devices XC4025E-4 HQ240C.
The board also contains 3 connectors from left to right:
– the first is used for data injection into the first FPGA device using
a Hewlett Packard (HP) pattern generator;
– the second one is used for analyzing data coming out from the
first device by making use of a logic analyzer probe;
– the third connector is used for the communication between CAR-
LOS v1 and the SIU board. Fig. 3.7 shows a picture of the final
SIU board. We used a SIU simplified version called SIMU (SIU
simulator), distributed at CERN for helping front-end designers to
realize DAQ-compatible devices. The SIMU board can be directly
plugged onto this connector.
64
3.4 — CARLOS v1
Figure 3.7: Picture of the SIU board
3.4.2 CARLOS v1 design flow
I have carried out the design of the second FPGA device following the
digital design flow shown in Fig. 3.8. In particular the design flow is
composed by the following steps:
– block specifications have been coded with the VHDL language
using a hierarchical structure starting from the bottom layer up
to the top-level;
– each VHDL model has been simulated in order to debug the code
using the Synopsys simulator software;
– each VHDL model has been synthesized, that means translated to
a netlist, using the Synopsys synthesis tool; the netlist contains
usual standard cells such as AND, OR or flip-flops, but the FPGA
device does not contain these elements, it contains only RAM
blocks. The netlist is only a logic representation of the circuit
itself, it has no physical meaning.
– the netlist is simulated using the Synopsys simulator, taking into
account cell timing delays and constraints.
– the netlist is automatically converted into a physical layout using
65
1D compression algorithm and implementations
Figure 3.8: Digital design flow for CARLOS v1
the place and route software Alliance from Xilinx.
– the layout information is put in a binary file ready to be down-
loaded on the EPROM chip using the Alliance software, together
with an EPROM programmer.
This is a very straight-forward and automated process; besides the
time needed between a slight modification in the VHDL code and its
actual implementation in the FPGA device is very short. This is the
main reason why FPGAs are so widely used for prototyping. An other
very important reason is the following one: running millions of test
vectors as a software simulation of a VHDL model is a very long process
even for fast machines; the same set of test vectors can be run in a
few seconds on the hardware prototype. FPGA implementation easily
allows algorithms verification on a huge amount of data.
66
3.4 — CARLOS v1
3.4.3 Functions performed by CARLOS v1
The FPGA on the left in Fig. 3.6 contains the 1D compression algo-
rithm, as explained in the previous sections, composed of 5 processing
blocks sequentially applied to the input data. The blocks form a 5-level
pipeline chain, each one requiring one clock cycle. The variable-length
compression coefficients are produced as 32-bit long words.
The FPGA on the right contains the following blocks:
– firstcheck : this block processes 32-bit input words coming from
the compressor FPGA: if the MSB is high the incoming data is
rejected, otherwise it is accepted and splitted in two different data
words, one 26-bit wide containing the variable length code and one
5-bit one containing the information of how many bits have to be
stored.
– barrel : this block packs 2 to 26 bits variable length codes in fixed-
size 32 bits words. The information of how many bits from 2 to 26
have to be stored is contained in the 5-bit length bus coming from
the firstcheck block. Variable length Huffman codes packed in 32-
bit words can be uniquely unpacked by using the Huffman table
and starting from the MSB to LSB. When a word is complete an
output-push signal is asserted.
– fifo: it contains a 64x32 RAM memory wide for storing data com-
ing out of the barrel shifter. When the FIFO contains at least
16 data words it asserts a query signal in order to ask the feesiu
block to begin data popping.
– feesiu: this is the most complex block of the prototype containing
the interface between CARLOS and the SIU board. The main be-
havior is quite simple: CARLOS waits for a “Ready to Receive”
(RDYRX) command from the SIU on a bidirectional data bus;
after receiving it CARLOS takes possession of the bidirectional
bus and begins sending data towards the SIU as 17 32-bit words
packets. Each packet is built as a header word containing exter-
67
1D compression algorithm and implementations
nally hardwired informations and 16 data words coming out of the
FIFO. When the FIFO is empty or it does not contain 16 data
words, no valid data is sent to the SIU. Otherwise if a FIFO begins
to acquire large quantities of data and the connection to the SIU
is not still open (a RDYRX command has not been received yet)
a data-stop signal is asserted for stopping the data stream coming
into CARLOS from AMBRA.
3.4.4 Tests performed on CARLOS v1
The test of the CARLOS prototype has been carried on using the pat-
tern generator and logic analyzer HP16700A at the INFN Section in
Torino. Data were injected on the first connector, analyzed on the
second connector, while the third one has been connected to a SIU
extender board, which directly connects to the SIMU board. The SIU
extender is very useful for debugging purposes since it provides 5 logic
analyzer compatible connectors for analyzing signals being exchanged
in the interface CARLOS-SIU. Here follows a list of the test performed
on CARLOS:
1. functional test and compression algorithm verification;
2. opening of a transaction by manually pushing buttons on the
SIMU board;
3. event data transmission from CARLOS to the SIMU. The SIMU
does not store data, so the only way to check if data are correct
on not is by using the logic analyzer.
Prototype test was especially useful in order to design a perfectly com-
patible interface towards the SIU. The main difficulty in testing the
interface towards the SIU without a SIU board is due to the presence
of bidirectional pads: it is quite a difficult job to work with such pads
using a pattern generator.
Many corrections had to be applied to the original version in order to
68
3.5 — CARLOS v2
have a 100% compatible interface. The final VHDL version was then
frozen and then used for the ASIC implementation of CARLOS v2.
The VHDL model, in fact, does not depend on the technology chosen
for the implementation and is completely re-usable.
3.5 CARLOS v2
The first CARLOS prototype has been very useful for testing the com-
pression algorithm on a huge amount of data and for correctly designing
complex blocks as the interface towards the SIU, but it has many lim-
itations if compared to the final version we need to design. So far we
decided to pass to a second prototype of CARLOS with the following
features:
– 40 MHz clock frequency;
– 8 macro-channels parallel processing;
– small size for an easier use in test-beam environment;
– a JTAG port for downloading the Huffman look-up tables, the
threshold and tolerance values .
The CARLOS chip design has been logically divided into two main
parts, the first one designed in Torino and the second one in Bologna:
– a data compressor on 8 incoming streams, using the 1D compres-
sion algorithm. The compressor accepts 8-bit input data and gives
as output 32-bit words containing the variable length codes.
– a data packing and formatting block, a multiplexer selecting which
one of the 8 incoming streams has to be sent in output and an
interface block towards the SIU.
As you can see in Fig. 3.9 the main sub-blocks are 6: firstcheck, barrel,
fifo, event-counter, outmux, feesiu.
69
1D compression algorithm and implementations
Figure 3.9: CARLOS v2 schematic blocks
70
3.5 — CARLOS v2
3.5.1 The firstcheck block
The I/O signals are:
– inputdata: input 32-bit bus;
– ck : input signal;
– reset : input signal;
– load : output signal;
– addressvalid : output 5-bit bus;
– datavalid : output 26-bit bus.
The firstcheck block takes as input the compressed codes coming from
the compression block and selects the useful bits while rejecting the
dummy ones. In fact the 32-bit input word has the following structure:
– bit 31: under-run bit: when set to 1 it means that incoming data
are dummy and have to be discarded; this may happen, for exam-
ple, when the run length encoder is packing long zeros sequences,
thus temporarily interrupting the data flow towards the SIU.
– bit 30 to 26: this 5-bit word contains the actual number of bits
that have to be selected by the following logic block, the barrel
shifter.
– bit 25 to 0: this 26-bit word contains the compressed code.
The real interesting bits are usually much less than 26, thus obtaining
a reduction in the data stream volume.
The firstcheck behavior is quite simple: when the reset signal is ac-
tive (active high) all outputs are set to 0; when reset is inactive the
firstcheck block samples the under-run bit value: when 1 all outputs are
set to 0, when 0 load is set to 1, addressvalid is assigned inpudata(30
downto 26) and datavalid is assigned inputdata(25 downto 0).
71
1D compression algorithm and implementations
3.5.2 The barrel shifter block
The I/O signals are:
– input : input 26-bit bus;
– sel : input 5-bit bus;
– load : input signal;
– ck : input signal;
– reset : input signal;
– end-trace: input signal;
– output-push: output signal;
– output : output 32-bit bus.
The barrel shifter has to pack all the valid data coming out from the
firstcheck block into a fixed-length 32-bit register word to be put in out-
put: in this way all dummy data are rejected and we have no more any
distinction between data-length and data itself. All data are packed in
the same word and can be easily reconstructed by using the Huffman
tree decoding scheme. If an input data cannot be completely stored
into a 32-bit word, it is broken into 2 pieces: the first as the MSBs of
the current output so to completely fill it, the second as the LSBs of
the following valid output word.
When the reset is active all internal registers and outputs are set to
0, when the reset is inactive the barrel shifter begins to wait for valid
data coming from the firstcheck block, that is data with the load signal
set to 1. When it happens the barrel shifter selects the valid bits from
input and packs them together in a 64-bit circular register word. When
32 bits are written on the register, the block asserts a signal output-
push high to communicate to the following block (the FIFO) that the
output is valid and has to be stored.
Two situations are very important for the barrel shifter working prop-
erly: when the load signal changes from 1 to 0 the barrel stops packing
72
3.5 — CARLOS v2
data and when load turns to 1 again the barrel begins packing data as
if no pause had happened.
The end-trace signal is asserted for one clock period in coincidence with
the last valid data: this data has to be packed together with the others,
then the 32-bit word has to be pushed in output (by putting output-
push to 1) even if it is not complete. After the end-trace and after
the last valid word has been sent to output the barrel shifter puts n
zero words as valid outputs: that number depends on how many words
have been sent to output from the beginning of the current event. In
fact the total number of valid words per event has to be an integer
multiple of 16. So far if (16k + 7) words have been sent in output after
the end-trace gets active n=9 zero words follows with output-push set
to 1. This condition is strictly related to the data transmission policy
and multiplexing of the 8 incoming data streams onto a single 32-bit
output, as will be explained in the next paragraph.
3.5.3 The fifo block
The I/O signals are:
– datain: input 32-bit bus;
– ck : input signal;
– push: input signal;
– pop: input signal;
– reset : input signal;
– empty : output signal;
– full : output signal;
– query : output signal;
– dataout : output 32-bit bus.
The fifo block contains a double-port RAM block with 64 32-bits words
plus some control logic. Its purpose is to buffer the input data stream
73
1D compression algorithm and implementations
and derandomize the queues that are waiting to be served by the out-
mux block. The buffer memory has to be large enough so to allow data
storing when the other queues are being served, since we have to avoid
block conditions. On the other side it cannot be too large since CAR-
LOS hosts 8 fifo blocks and the chip area is a strong design constraint.
The fifo allows 3 main storage operations:
– write only;
– read only;
– read/write at the same time but at different cell locations.
The FIFO allows to write data coming from the barrel shifter and to
read them when the queue has to be served by the outmux block. The
most important feature is that read and write operations can be exe-
cuted in parallel. In order to accomplish this feature the control logic
provides two pointers named address-write and address-read. They run
from 0 to 63 and then back to 0 in a circular way: obviously address-read
has always to follow address-write, otherwise we would be extracting
invalid data from the memory. Data is written in the fifo and the
address-write pointer is incremented by one when the input push is set
to 1: the input push of the fifo is the same signal as the output-push
one from the barrel. In this way when the barrel shifter has an output
valid, it is written in a free location of the fifo at the next clock cycle.
The RAM read phase is activated by the pop input signal: for every
clock cycle in which pop is 1, the data value corresponding to address-
read is taken in output dataout and then the pointer address-read is
incremented by 1. When both push and pop are set to 1 the fifo is
read and written at the same time and the distance between the two
pointers remains constant. Three important signals are:
– query signal: the query signal is set to 1 when the memory contains
at least 16 valid data, that is when the distance among the two
pointers is greater or equal to 16. The query signal is used at
the outmux block where a priority encoding based arbiter decides
74
3.5 — CARLOS v2
which of the 8 queues has to be served in output. When a fifo
block is served by the outmux, the number of total valid words
decreases and the signal query comes back to 0. It can happen
that the signal query remains to 1 if more than 32 valid words were
stored in the fifo. In this case it is possible that the fifo might be
read again. All depends on how many queues are sending queries
for being emptied to the scheduler.
– empty signal: the empty signal is set to 1 when the fifo does not
contain any valid data, that is when address-write and address-
read have the same value and are pointing to the same memory
location. This signal will be used by the feesiu block in order to
decide when all the 8 queues have been completely emptied and a
new data set can enter CARLOS.
– full signal: the full signal is very important since it is back-
propagated to the compressor block in order to assert the fact that
the FIFO is getting full and the input stream has to be stopped.
The compressor block will back-propagate this full signal to the
AMBRA chip which will stop sending data to CARLOS. Obvi-
ously the full signal has to be asserted before the FIFO is really
full, otherwise some input data would be lost. For this reason the
fifo full signal works between 2 thresholds: 32 and 48: the full
signal goes high when the fifo contains more than 48 valid words,
then it comes back to 0 only when the fifo has been served by the
outmux block, that is when the fifo contains less than 32 valid
words. With this trick the risk for the fifo to get completely full
is reduced, at least if the queues arbiter is fair enough with every
input stream.
3.5.4 The event-counter block
The I/O signals are:
– end-trace: input signal;
75
1D compression algorithm and implementations
– ck : input signal;
– reset : input signal;
– event-id : output 3-bit bus.
The event-counter block is a very simple 3-bit binary counter used
to assign a number to every physical event, at least for being able to
easily discriminate consecutive events. When the reset is active internal
registers and outputs are put to 0, then, when the reset is inactive,
the event-counter block increments by one its output signal event-id
every time it samples the end-trace signal at logic level 1. The end-
trace feeding the event-counter block is a signal coming from the feesiu
block called all-fifos-empty. This signal is asserted for two clock periods
when all the 8 end-trace signals have been set to 1 and when all the
8 queues have been completely emptied. For this purpose CARLOS
contains a global end-trace signal which is activated when all the 8
local end-traces have been high for at least one clock period; it is not
strictly necessary that a temporal overlap exists between the 8 signals.
Nevertheless, this means that the global end-trace will never be put to
1 if some of the local end-traces are not used and remain stuck at 0.
After an end-trace global is activated, the feesiu block begins waiting
for the 8 FIFOs being emptied: as soon as this happens the all-fifos-
empty signal is activated and the event-id signal is incremented by one.
The signal all-fifos-empty stays at logical level 1 for two consecutive
clock periods: nevertheless the event-id counter is incremented only by
1. The value of event-id is used in the outmux block and it is sent to
the SIU as a part of the header word. We thought that 3 bits could be
sufficient to discriminate the events and for putting them in the right
order during data decompression and reconstruction stages.
3.5.5 The outmux block
The I/O signals are:
– indat7 : input 32-bit bus;
76
3.5 — CARLOS v2
– indat6 : input 32-bit bus;
– indat5 : input 32-bit bus;
– indat4 : input 32-bit bus;
– indat3 : input 32-bit bus;
– indat2 : input 32-bit bus;
– indat1 : input 32-bit bus;
– indat0 : input 32-bit bus;
– reset : input signal;
– ck : input signal;
– query : input 8-bit bus;
– event-id : input 3-bit bus;
– enable-read : input signal;
– half-ladder-id : input 7-bit bus;
– good-data: output signal;
– read : output 8-bit bus;
– output : out 32-bit bus.
The outmux block has two distinct functions in the overall logic:
– multiplexing the 8 compressed and packed streams onto a single
32-bit output (femux sub-block);
– deciding which queue has to be served using a priority encoding
based arbiter (ppe sub-block).
The femux and ppe blocks implement the following 17-word data packet
transmission protocol (see Fig. 3.10):
– a 32-bit header;
– 16 32-bit data words, all coming from one macrochannel and from
one event.
77
1D compression algorithm and implementations
Figure 3.10: 17-bit word data transmission protocol
The header contains the following information from MSB to LSB:
– half ladder id (7 bits): this number is hardwired externally to each
CARLOS chip, depending on the ladder it will be connected to;
– packet sequence number (10 bits): this is a 10-bit wide counter
incremented once a packet is transmitted, i.e. every 17 data words;
– cyclic event number (3 bits): this is the event number coming from
the event-counter block;
– available bits (9 bits): these will be used in a future expansion of
CARLOS;
– half detector id (3 bits): every half ladder contains 8 half detectors.
They are numbered from 0 to 7 and this number is provided by
the macro-channel being served.
Let’s take a look at the 2 sub-blocks of the outmux :
78
3.5 — CARLOS v2
– femux is a multiplexer with nine 32-bit inputs and a 9-bit selection
bus. The 9 data inputs are the header and the 8 input channels
coming from the FIFOs. The selection bus value is given by the
queues scheduler: this bus contains all zeros but one.
– ppe stands for programmable priority encoder. It is a completely
combinatorial block with two inputs and one output: request (8
bits) contains the query signals coming from the 8 macro-channels;
priority (8 bits) is a bus containing only one 1 and all the other
bits at 0; served (8 bits), like priority, contains only one bit at
logic level 1 and this bit indicates which of the 8 macro-channels
has to be served from the femux.
The programmable priority encoder works in a very simple way:
it scans the request bus starting from the bit stuck at 1 in the
priority bus until it finds a 1. Its bit position from 0 to 7 corre-
sponds to the channel chosen by the arbiter. At the next choice
that the arbiter has to take, the priority bus value is updated in
the following way: the served bus value is shifted on the right as if
it were a circular register and its value is assigned to the priority
bus. In this way we avoid the risk of a queue being served many
times consecutively in spite of other queues making requests. An
example will easily clarify this situation: request = 10100010, pri-
ority = 00010000, served = 00000010. At the next clock cycle,
the value ”00000001” will be assigned to the priority bus. There
are several possible implementations for a scheduling algorithm
based on a programmable priority encoder: they differ in area
and timing requirements. We chose the implementation used in
the Stanford University’s Tiny Tera prototype as described in [18].
I’ll try now to explain how the outmux block works: the outmux block
is stopped and it is initialized when the reset signal is active. When
the reset is inactive, the outmux block begins waiting for the enable-
read signal to get active. This is a signal coming from the feesiu block:
when low it states that the link between the SIU and CARLOS has
79
1D compression algorithm and implementations
not been initialized yet or it means that temporarily the SIU cannot
accept data. When the enable-read is high, the SIU is able to receive
data from CARLOS, so the outmux block begins evaluating the value
of the query bus. When its value is low it means that no macro-channel
has still required to be served, otherwise the ppe block decides which
queue to send in output. The first word served as output is the header
word containing the information on the macro-channel being served
and other information as stated above in the paragraph. In order to
get the 16 data words to send as output, the outmux block has to
provide the right pop signal to send to one of the 8 FIFOs. The 8
pop signals to the FIFOs are grouped in the 8-bit read bus; of course
only one bit at a time will be asserted. Signal read(7) will be sent to
fifonew7, read(6) to fifonew6 and so on, as to extract 16 valid data
from the FIFO. Since we want to send data to the SIU at a 20 MHz
clock (half the system clock frequency) the pop signal cannot be stuck
at 1 for 16 clock periods but it is alternatively 0 and 1 in order to get
a data word out from the FIFO one clock period every two. When
the outmux block is putting in output the 17 words of the packet, the
output signal good-data is set to 1 in order to grant the feesiu block
that it is receiving significant data. While sending the last data word
of a packet, the outmux block updates the priority bus value as stated
above and examines the query bus value, then it computes the right
served value. If served is not 0, that is if any request has occurred, the
outmux block begins sending in output an other packet, without any
interruptions (there are not wasted clock periods), otherwise the block
stops waiting for a new request to be asserted. If the enable-read turns
from 1 to 0 when transmitting data, the outmux block sends only an
other valid word in output, then stops and waits for the enable-read
signal to be restored to its active value: then it continues sending data
to the feesiu block as if no pause had really occurred. The outmux
block itself provides to increment the 10-bit packet sequence number
after every packet has been completely transmitted.
The reason why a 20 MHz clock has been chosen is related to the
80
3.5 — CARLOS v2
total optical fibre bandwidth to be used by CARLOS: 800 Mbits/s. If
CARLOS puts in output 32-bit data at 40 MHz the total bandwidth
required is 1.280 Gbits/s, while at 20 MHz only 640 Mbits/s. For this
reason a half-frequency data rate has been chosen as the final one.
3.5.6 The feesiu (toplevel) block
The I/O signals are:
– huffman7 : input 32-bit bus;
– huffman6 : input 32-bit bus;
– huffman5 : input 32-bit bus;
– huffman4 : input 32-bit bus;
– huffman3 : input 32-bit bus;
– huffman2 : input 32-bit bus;
– huffman1 : input 32-bit bus;
– huffman0 : input 32-bit bus;
– ck : input signal;
– reset : input signal;
– end-trace7 : input signal;
– end-trace6 : input signal;
– end-trace5 : input signal;
– end-trace4 : input signal;
– end-trace3 : input signal;
– end-trace2 : input signal;
– end-trace1 : input signal;
– end-trace0 : input signal;
– fidir : input signal;
81
1D compression algorithm and implementations
– fiben-n: input signal;
– filf-n: input signal;
– half-ladder-id : input 7-bit bus;
– wait-request7 : output signal;
– wait-request6 : output signal;
– wait-request5 : output signal;
– wait-request4 : output signal;
– wait-request3 : output signal;
– wait-request2 : output signal;
– wait-request1 : output signal;
– wait-request0 : output signal;
– foclk : output signal;
– fbten-n: bidirectional signal;
– fbctrl-n: bidirectional signal;
– fobsy-n: output signal;
– fbd : bidirectional 32-bit bus.
The VHDL feesiu block contains all the other block instances (see
Fig. 3.11) and the logic working as interface with the SIU board. So
far the feesiu block contains 8 instances of firstcheck, 8 instances of
barrel, 8 instances of fifo, 1 instance of event-counter and 1 instance of
outmux. However we can imagine the feesiu block as the block taking
data from the outmux block and directly interfacing the SIU board, as
if it were at the same hierarchical level as the other blocks. In Fig. 3.9
the feesiu block is represented exactly in this fashion.
3.5.7 CARLOS-SIU interface
Let’s now take a look the interface signals between CARLOS and the
SIU and how the communication protocol has been implemented:
82
3.5 — CARLOS v2
Figure 3.11: Design hierarchy of CARLOS v1
– fidir : it’s an input to CARLOS. It asserts the direction of the
data flow between CARLOS and the SIU: when low, data flow is
directed from the SIU to CARLOS, otherwise data flow is directed
from CARLOS to the SIU.
– fiben-n: it’s an input to CARLOS, active low. It enables the com-
munication on the bidirectional buses between CARLOS and the
SIU. When low, communication is enabled, otherwise communi-
cation is disabled.
– filf-n: it’s an input to CARLOS, active low, ”lf” stands for link
full. When the SIU is no longer able to accept data coming from
CARLOS, it puts this signal active. When this happens CARLOS
sends an other valid data word, then stops transmitting waiting
for the filf-n signal to be asserted again. This is the signal used by
the SIU to implement the back-pressure on the data flow running
from the front-end to the data acquisition system.
– foclk : it is a free running clock generated on CARLOS and driving
83
1D compression algorithm and implementations
the CARLOS-SIU interface. It is a 20 MHz clock generated by
dividing the system clock frequency by 2. Interface signals coming
from the SIU are triggered on the falling edge of foclk.
– fbten-n: it is a bidirectional signal, active low, it can be driven by
CARLOS or by the SIU, ”ten” stands for transfer enable. When
CARLOS is assigned to drive the bidirectional buses (when fidir
is high and fiben-n is 0) fbten-n value is asserted from CARLOS: it
turns to its active state when CARLOS is transmitting valid data
to the SIU, otherwise it is inactive. When the SIU is assigned
to drive the bidirectional buses (when fidir is 0 and fiben-n is
0) fbten-n value is asserted from the SIU: it turns to its active
state when the SIU is transmitting valid commands to CARLOS,
otherwise it is inactive.
– fbctrl-n: it is a bidirectional signal, active low, it can be driven by
CARLOS or by the SIU, ”ctrl” stands for control. When CARLOS
is assigned to drive the bidirectional buses (when fidir is 1 and
fiben-n is 0) fbctrl-n value is asserted from CARLOS: it turns
to its active state when CARLOS is transmitting a Front End
Status Word to the SIU, otherwise, when in the inactive state,
CARLOS is sending normal data to the SIU. When the SIU is
assigned to drive bidirectional buses (when fidir is 0 and fiben-n
is 0) fbctrl-n value is asserted from the SIU: it turns to its active
state when sending command words to CARLOS, to its inactive
state when sending data words. The second option has not been
implemented on CARLOS since we decided that CARLOS needs
only commands and not data from the SIU. Other detectors use
this option in order to download data to the detector itself: this
is the case, for example, of the Silicon Pixel Detector.
– fobsy-n: it is an input signal to the SIU, active low, ”bsy” stands
for busy. CARLOS should put this signal active when not able
to accept data coming from the SIU. Since CARLOS has not to
receive data from the SIU, this signal has been stuck at 1, meaning
84
3.5 — CARLOS v2
that CARLOS will never be in a busy state. In fact it always has
to accept command words coming from the SIU.
– fbd : it is a bidirectional 32-bit bus on which data or command
words are exchanged between CARLOS and the SIU.
This is the way the communication protocol works: the SIU acts as the
master and CARLOS acts as the slave, i.e. the SIU sends commands to
CARLOS and CARLOS sends data and front end status words to the
SIU. At first the link CARLOS - SIU has to be initialized and the SIU
acts as the master of the bidirectional buses. So CARLOS waits for the
bidirectional buses to be driven from the SIU (fidir is 0 and fiben-n is
0) and waits for a valid (fbten-n = 0) command (fbctrl-n = 0) named:
Ready to Receive (RDYRX). This command is always used in order
for a new event transaction to begin. The RDYRX command contains
a transaction identifier (bits 11 to 8) and the string ”00010100” as the
less significant bits.
As the command is accepted and recognized, CARLOS waits for the
fidir signal to change value in order to take possession of the bidirec-
tional buses, then, if the filf-n is not active, it is able to send valid
data on the fbd bus if the good-data signal is active. In this state,
CARLOS sends valid data of an event to the SIU only when some
queues are making requests of being served in output, otherwise the
feesiu stops sending data by putting the fbten-n signal to 1. When
an end-trace signal has arrived on each macrochannel and every queue
has been completely emptied (no more data of a particular event are
stored in CARLOS yet), CARLOS puts in output the Front End Sta-
tus Word (FESTW), a word that confirms that no errors occurred and
that the whole event has been successfully transferred to the SIU. The
FESTW contains the Transaction Id code received upon the opening of
the transaction (bits 11 to 8) and the 8-bit FESTW code ”01100100”.
After this happens CARLOS begins to wait for some action of the SIU
to be taken: it means that the SIU can decide to take back its control
on the bidirectional buses and close the data link towards the data ac-
85
1D compression algorithm and implementations
quisition system, or the SIU can leave the bidirectional buses control to
CARLOS for an other data event to be sent. So far, CARLOS begins
waiting 16 foclk periods: if nothing happens, CARLOS is able to begin
sending data again without the need to receive some other commands
from the SIU; if the SIU takes back the possession of the bidirectional
buses, CARLOS closes the link towards the SIU and keeps waiting for
an other RDYRX command raised from the SIU itself.
The feesiu block implements this communication protocol with the SIU
using a simple state-machine: for example state 0 is the state in which
CARLOS is waiting for a command of initialization from the SIU, state
1 is the state in which CARLOS sends data from the SIU, state 2 in
which CARLOS sends the front end status word to the SIU, state 3
in which CARLOS waits 16 foclk periods waiting for some action from
the SIU to happen.
An important feature of CARLOS realized in the feesiu block is the
following one: CARLOS cannot accept a new event before the previ-
ous one has been completely sent in output, otherwise we run into the
risk of mixing data belonging to different events. The only way CAR-
LOS has to implement back-pressure on the AMBRA chips is using the
wait-request signals. So far the wait-request signal has to avoid that
CARLOS fetches new input data values while emptying the FIFOs.
For this reason a new signal, dont-send-data, has been introduced for
every macro-channel which turns to 1 when the end-trace is activated
and turns back to 0 when all the FIFOs are completely empty. So
the wait-request of every macro-channel is obtained by putting in OR
the full and dont-send-data signals. The feesiu acknowledges that all
the FIFOs have been emptied using the empty signal of every FIFO
block. When all the 8 signals turn to 1 the feesiu block raises the all-
fifos-empty signal which stands at logical level 1 for at least two clock
periods in order to be sensed by the foclk clock. The all-fifos-empty sig-
nal is also used to trigger the event-counter block: in fact the number
of total events is exactly the same as the total number of occurrences
of the all-fifos-empty signal. An other signal, end-trace-global is set to
86
3.6 — CARLOS v2 design flow
Figure 3.12: Digital design flow for CARLOS v2
1 only if all the local end-trace signals have been put to 1 for at least
one clock period in the current event. From the moment in which the
end-trace-global is asserted and when the all-fifos-empty is activated
no new input data set can enter CARLOS.
3.6 CARLOS v2 design flow
Fig. 3.12 illustrates the digital design flow for CARLOS v2. The front
end steps are exactly the same as the ones followed in the design of
CARLOS v1. The only difference is the library used, being, in this
case, the Alcatel Mietec 0.35 µm digital library provided via Euro-
practice. This is a very rich library since it contains more than 200
differents standard cells and RAM blocks with several dimensions. A
87
1D compression algorithm and implementations
Figure 3.13: Layout of the ASIC CARLOS v2
RAM generator software allows the designer to get a macrocell with
the exact number of words and bits per word as requested: in our case
a 64 32-bit macrocell instantiated 8 times, one for macrochannel.
The back end steps were carried out at IMEC using the Avant! soft-
ware Acquarius. We could not succeed to get a license of this software
due to the high cost (more then 100k$ for a license), while no other
available software, such as Cadence, was able to work with the design
kit provided. The final physical layout is depicted in Fig. 3.13. The
chip has a total area of 30 mm2 containing 300 k standard cells, 180
88
3.7 — Tests performed on CARLOS v2
I/O pads and 24 RAM blocks.
After the design of the layout, IMEC sent us the post-layout netlist
and a SDF file (Standard Delay Format) containing the information
on each net and cell delay for post-layout simulation with the same
test-benches already used for pre-layout simulation. This is usually an
iterative process since, if some simulation problems arise, the layout
has to be re-designed. Luckily due to the relatively small working fre-
quency (40 MHz) (the technology adopted can easily work up to 200
MHz) the post-layout simulation gave no problems and the design was
then sent to the foundry.
3.7 Tests performed on CARLOS v2
After receiving from the Alcatel Mietec foundry 20 samples of naked
chips (without any package), they have been directly bonded on the
test PCB at the INFN of Torino, one sample per PCB. The test PCB
shown in Fig 3.14, especially designed for testing CARLOS v2 and for
its use in test beam data taking, contains the following:
– 5 2x10 pins DIL connectors pin compatible with the pattern gen-
erator and logic analyzer HP16600/16700A pods;
– 2 Mictor 38 connectors;
– a DIP switch providing a facility to setup the hardwired parame-
ters, such as the half ladder ID;
– filter capacitors for a total capacity greater than 100 nF;
– buffers for preserving CARLOS input pads integrity.
After testing the JTAG control unit on CARLOS, the connection to-
wards the SIMU was successfully tested: after the SIMU opens a trans-
action, CARLOS takes possession of the bidirectional buses and starts
sending data. After these tests, the SIMU has been replaced by the
SIU board and all the data acquisition system, i.e. DIU (Destination
89
1D compression algorithm and implementations
Figure 3.14: CARLOS v2 test board
Interface Unit) and PCI RORC (Read Out Receiver Card) directly con-
nected to a PC. So far testing CARLOS behavior with huge amounts
of data becomes easier to simply use the Logic State Analyzer and the
complete data acquisition system can be used to acquire data in test
beams.
90
Chapter 4
2D compression algorithm
and implementation
This chapter contains a brief description of the 2D algorithm [19] con-
ceived at the INFN Section of Torino and a first implementation at-
tempt in ASIC with the third prototype of CARLOS.
4.1 2D compression algorithm
The 2D algorithm operates a data reduction based on a two-threshold
discrimination and a two-dimensional analysis along both the drift time
axis and the SDD anode axis. The proposed scheme allows for a bet-
ter understanding of the neighbourhoods of the SDD signal clusters,
thus improving their reconstructability and also provides a statistical
monitoring of the background features for each SDD anode.
4.1.1 Introduction
As shown in Chapter 3, due to the presence of noise a simple single-
threshold one-dimensional zero suppression does not allow a good clus-
91
2D compression algorithm and implementation
ter reconstruction in all circumstances. Indeed in order to obtain a
good compression factor using the 1D algorithm a threshold of about
three times the RMS of the noise has to be used. Such threshold often
determines a rather sharp cut of the tails of the anode signals contain-
ing high samples and, more important, it can completely suppress the
anodic signals with small values which are on the sides of the cluster.
Both these sharp cuts, particularly the latter, can significantly affect
the spatial resolution. Though samples below a 3 RMS threshold have
small information contents, it is conceivable that, in the more accurate
off-line analysis, they can help to improve the pattern recognition and
the fitting of the cluster features. In order to read out small-amplitude
samples without increasing too much the collection of the noise, a two-
threshold algorithm can be used, so that small samples that satisfy a
low threshold are collected only when, along the drift direction, they
are near to samples satisfying a high threshold. Since the charge cloud
diffuses in two orthogonal directions for symmetry reasons and due the
previous considerations, the two-threshold method should be applied
along the anode axis too. We want that such a two-threshold two-
dimensional data compression and zero suppression algorithm satisfy
the following criteria:
– the values of the samples, in the neighbourhood of a cluster, be
available both for an accurate measurement of the characteristics
of the clusters and for a good monitoring and understanding of
the characteristics of the background;
– the statistical nature of the suppressed samples be available to
monitor the noise level of the anodes and to obtain their baseline
values, which have to be subtracted from the cluster samples in
order to obtain a correct measurement of the related charge.
Here follows a description of the studied algorithm: the data reduc-
tion algorithm is applied to the resulting matrix of 256 rows by 256
columns like the one shown in the upper part of Fig. 4.1. Each matrix
element expresses an 8-bit quantized amplitude. A row represents a
92
4.1 — 2D compression algorithm
Figure 4.1: Example of the digitized data produced by a half SDD
time sequence of the samples from a single SDD anode and a column
represents a spatial snapshot of the simultaneous anode outputs for an
instant of time. For each charge cloud we expect several high values in
one or more columns and rows. This extension in both time and space
thus requires that correlations in both dimensions be preserved for fu-
ture analysis. We refer to correlations within a column as space-like
and correlations within a row as time-like. Therefore, in the proposed
two-threshold two-dimensional algorithm, the high threshold TH must
be satisfied by a pixel value in order that it be part of a cluster, and the
93
2D compression algorithm and implementation
EW
S
C
N
Figure 4.2: Neighbourhood of the pixel C
low threshold TL leads to the registering of a pixel whose value satisfies
it, if adjacent to an other pixel satisfying TH . In this way the lower
value pixels on the border of a cluster are encoded thus ensuring that
the tails of the charge distribution are retrieved.
Within this framework, a cluster is redefined operationally as a set of
adjacent pixels whose values tend to stand out above the background.
In the described algorithm there is a trade-off in the definition of such
a cluster, which lies in the definition of adjacency. We have considered
as adjacent (or neighbour) to the (i, j) element, the pixels for which
only one of the two indexes change by 1: so far the neighbour pixels are
(i− 1, j), (i+ 1, j), (i, j − 1) and (i, j + 1). Thus a correlation involves
a quintuple composed of a central (C) pixel and its north (N), south
(S), east (E) and west (W) neighbours only (see Fig. 4.2. In order to
monitor the statistical nature of the suppressed samples, the number of
zero quantized values (due either to negative analog values of the noise
or to baseline equalization), and the numbers of samples satisfying TH
and TL are recorded. The background average and standard devia-
tion are obtained by applying a minimization procedure to the three
counted data. An aspect of this reduction algorithm allows the conser-
vation of information about the background both near and far from the
clusters. When the thresholds are properly chosen, statistically, pairs
and a few triplets of background pixels not associated with a particle-
produced cluster will satisfy the described discrimination criteria and
94
4.1 — 2D compression algorithm
Figure 4.3: Cluster in two dimensions and its slices along the anode di-
rection
provide consistency information on the background statistics, assumed
to be Gaussian white noise. At the same time single high background
peaks are suppressed as zeros (if they do not have at least one neigh-
bour that satisfies at least the low threshold) so as not to overload the
data acquisition and to allow an efficient zero suppression. The only
parameters needed as input to the 2D compression algorithm are the
two thresholds, TH , TL and the baseline equalization values.
4.1.2 How the 2D algorithm works
The 2D algorithm makes use of two threshold values:
– a high threshold TH for cluster selection;
– a low threshold TL so to collect information around the selected
cluster.
The algorithm retains data belonging to a cluster and around a cluster
in the following way (as graphically shown as an example in Fig. 4.3):
– the pixel matrix is scanned searching for values higher than the
TH value (70 in Fig. 4.3);
– the pixels positioned around the previously selected ones are ac-
cepted if higher than the low threshold value TL (40 in Fig. 4.3),
otherwise they are rejected;
95
2D compression algorithm and implementation
– thus a cluster is defined and cluster values are saved exactly as
they are: other pixels, not belonging to clusters, are discarded;
– if a pixel value higher than the TH value is found but it has not
pixel values higher than TL around its value is rejected. This is
the case of the 78 value on the bottom-left corner in Fig. 4.3 which
is discarded, even it its value is greater than the high threshold
value.
– pixel values belonging to a cluster are encoded using a simple look-
up table method, assigning long codes to non-frequent values and
short codes to frequent symbols.
So far in Fig. 4.3, after applying the 2D compression algorithm, only
the shadowed values are stored, while the other value ares erased. The
2D algorithm is conceptually very simple to understand, but it is quite
more complex than the 1D for what concerns hardware implementation.
In fact having to perform a bi-dimensional analysis of the pixel array
implies the need of storing all the information on a digital buffer on
CARLOS, thus requiring a larger silicon surface and a higher cost.
4.1.3 Compression coefficient
Fig. 4.4 shows the 2D compression coefficient as a function of the high
threshold value, calculated using data coming from the test beam of
September 1998. The 2D compression algorithm reaches a compression
ratio of 22 choosing TH value of 1.5 noise RMS and TL of 1.2 noise
RMS. It is to be remembered that the 1D compression algorithm had
to use a threshold level of 3 noise RMS in order to reach the target
compression ratio. So far the 2D algorithm shows higher performances
than the 1D since it reaches the target compression ratio, while losing
a lower amount of physical information. This is the main reason why
the 2D algorithm has been chosen as the one that will be implemented
on the final version of CARLOS.
96
4.1 — 2D compression algorithm
Figure 4.4: 2D compression coefficient ratio as a function of the high
threshold
4.1.4 Reconstruction error
Even for what concerns the reconstruction error, the 2D algorithm
proves to have better performances than 1D. In fact the difference val-
ues between cluster centroid position before and after compression are
fitted by a Gaussian distribution centered around the 0 value with a
σ value of 10 µm along the drift time direction and 10 µm along the
anode direction, choosing 1.5 noise RMS for TH and 1.2 noise RMS for
TL. So far the 2D algorithm manages to achieve a better cluster center
resolution than 1D by keeping track of more pixel values around the
cluster center. Moreover the 2D algorithm introduces a smaller bias on
the reconstructed charge than 1D with a value of around 3 %, meaning
that the reconstructed cluster charge is 3 % lower than before compres-
sion - decompression steps.
Beside that the 2D algorithm is very useful for what concerns the study
of the noise distribution: in fact monitoring the couples of noise sam-
ples passing the double threshold filter allows to recover information
on the average and on the standard deviation of the Gaussian noise
distribution. This is quite important for checking how the signal to
background ratio changes in time.
97
2D compression algorithm and implementation
If used in lossless mode, the 2D compression ratio is 1.3 versus the
2.3 value obtained using the lossless version of the 1D algorithm: this
requires a more complex second level compressor in counting room, in
order to reach the target compression ratio of 22, in the case the 2D
compression algorithm cannot be applied to data. In fact there are
some cases in which it might prove no longer desirable the use of the
2D compression algorithm: for example when the baseline value is not
constant through the 256 samples of an anode row. This is the case of
the present version of the PASCAL chip, which introduces a slope in
each anode row baseline and, what is worst, the slope value varies from
different rows. It is obvious that a fixed double-threshold compressor,
as the one explained in this Chapter, cannot deal with this problem. So
far the foreseen solution is to eliminate the baseline slope in the final
version of PASCAL. If this proves to be not possible or if a baseline
with slope behavior emerges after some working time, the use of the
2D algorithm can no longer be accepted. In this case data compression
on CARLOS has to be switched off and a second level compressor al-
gorithm implemented directly in counting room will do the job.
4.2 CARLOS v3 vs. the previous proto-
types
There are several differences between CARLOS v3 and the previous
versions. This is a brief list containing the most important ones:
1. CARLOS v1 and v2 were meant to work in a radiation free envi-
ronment, since, when they were designed, the problem of radiation
had not been faced yet. So far commercial technologies such as
Xilinx FPGAs or Alcatel Mietec design kit have been chosen for
prototype implementation. The necessity for CARLOS to work in
a radiation environment emerged some times after sending CAR-
98
4.2 — CARLOS v3 vs. the previous prototypes
LOS v2 to the foundry. The radiation level CARLOS has to with-
stand is in the range from 5 to 15 krads. This led us to the search
of a radiation-safe technology.
One of the possible solutions is given by SOI (Silicon On Insulator)
technology which provide a complete radiation resistance. This is
the case for instance of the 0.8 µm DMILL technology that is be-
ing widely used even in satellite applications at ESA (European
Space Agency). The problem related to this technology in mainly
one: the cost is too high for our budget. So far we decided to
choose a commercial technology, IBM 0.25 µm, with a library of
standard cells designed to be radiation tolerant up to some Mrads.
The library has been designed by the EP-MIC group at CERN.
2. Mechanical constraints emerged not allowing the use of the SIU in
the end-ladder zone, since it is far too big for the space available.
Another problem concerning the SIU is that this device cannot
safely work in a radiation environment since it contains commer-
cial devices, such as ALTERA PLDs. Finally the laser driver
hosted on the SIU board has a mean life of a few years, while we
are looking for something lasting until the end of the experiment
data taking.
These considerations led us to change all the readout architec-
ture from CARLOS to the DAQ. Instead of directly interfacing
the SIU, CARLOS v3 interfaces the radiation-tolerant serializer
GOL chip (Gigabit Optical Link) [20]. Serial data is then sent
to the counting room using a 200 m long optic fibre, deserialized
using a commercial deserializer device and then sent to the SIU
board using a FPGA device named CARLOS-rx that is still to
be designed. This final readout architecture is shown in details in
Fig. 4.5.
3. CARLOS v3 contains only 2 data processing channels, versus the
8 hosted in the two previous prototypes. This choice was due to
99
2D compression algorithm and implementation
the need of reducing the ASIC complexity and to greatly reduce
the possibility of losing data in case of chip failure. In fact if
a CARLOS v2 chip breaks down for some reasons, data coming
from a half-ladder, i.e. from 4 detectors, is completely lost until
the chip is substituted with a working one. On the other side, if
a CARLOS v3 chip breaks down, only data coming from an SDD
detector are lost. So far a 2-channel version of CARLOS provides
a greater failure resistance and is far less complex.
4. CARLOS v3 contains a preliminary interface with the TTCrx chip
that distributes trigger signals and the clock to the end-ladder
board.
5. CARLOS v3 also contains a BIST structure (Built In Self Test)
for a quick test of the chip itself issued via the JTAG port.
Figure 4.5: The final readout chain
100
4.3 — The final readout architecture
4.3 The final readout architecture
The chosen architecture for the final readout system introduces new
items to carry on and new problems to solve.
For instance splitting CARLOS in 4 chips makes every chip much sim-
pler to design, test and control (CARLOS v2 is a very complex and
difficult to debug chip), but moving the SIU board in counting room
implies the design of the CARLOS-rx device taking data from 4 dese-
rializer chips and feeding data to the SIU.
Beside that, putting a 200 m distance between CARLOS and the SIU
implies that no back-pressure can be used: in fact if the SIU asserts
the filf − n signal, meaning that it cannot accept further data start-
ing from the following foclk signal, CARLOS receives this information
after 2 µs, i.e. after 40 foclk cycles. So far the CARLOS-rx chip has
to contain a well-sized FIFO buffer chip to store data when the SIU is
not able to accept them.
The role of the JTAG link is shown in Fig. 4.6. In the new architecture
a transaction can be opened and closed via the JTAG link, instead of
using the 32-bit bus fbd. The JTAG link is obtained serializing the
5-bit JTAG port coming from the SIU for transmission to the front-
end zone through an optic fibre, then the HAL (Hardware Abstraction
Layer) chip performs the serial to parallel conversion for distributing
the JTAG signals to the PASCAL, AMBRA and CARLOS chips. A
rad-hard version of the HAL chip has to be implemented yet.
Currently we plan to use a commercial pair of chips for serializing-
deserializing data from Agilent Technologies: in the final architecture
the serializer chip will be substituted with the rad-hard Gigabit Optical
Link (GOL) chip designed by the Marchioro group at CERN. This chip
is a multi-protocol high-speed transmitter ASIC, wich is able to with-
stand high doses of radiation. The IC supports two standard protocols,
the G-Link and GBit-Ethernet and sustains transmission data at both
800 Mbits/s and 1.6 Gbits/s. The ASIC was implemented using CERN
library 0.25 µm CMOS technology employing radiation tolerant layout
101
2D compression algorithm and implementation
Figure 4.6: Final readout chain zoom
techniques.
A problem concerning the use of the GOL chip is to be solved yet: the
TTCrx chip distributes to all front-end chips a clock with a maximum
jitter of around 300 ps. This is not a problem for AMBRA and CAR-
LOS ICs working at 40 MHz but it proves to be a big problem for the
GOL chip, since it contains an internal PLL to multiply the incoming
40 MHz clock by 20 or 40, so to get an internal 800 MHz or 1.6 GHz
frequency. The PLL shows some synchronization problems with the
incoming clock if the input jitter is greater than 100 ps. This problem
has still to be faced and solved.
4.4 CARLOS v3
CARLOS v3 is our first prototype tailored to fit in the new readout
architecture. The main new features of this chip are:
102
4.5 — CARLOS v3 building blocks
– two processing channels;
– the radiation tolerant technology chosen.
Nevertheless CARLOS v3 does not contain the complete 2D compres-
sion algorithm as would be expected. We made this choice in order to
acquire experience with a small chip with the new technology and with
the new layout techniques since we had to carry out the layout design
task. Taking into account that the CERN 0.25 µm library contains a
small number of standard cells and they are not so well characterized as
commercial ones, we decided to try the new design flow and new tech-
nology with a simple chip: the result is CARLOS v3, that has been
sent to the foundry in November 2001 and will be tested starting from
February 2002.
As a compression block, CARLOS v3 only hosts the simple encoding
scheme conceived as the final part of the 2D algorithm. Nevertheless if
CARLOS v3 proves to be perfectly working, it will be used to acquire
data in the test beams and will allow us to build and test the foreseen
readout architecture.
4.5 CARLOS v3 building blocks
Fig. 4.7 shows the main building blocks of CARLOS v3. The complete
design of CARLOS v3 has been carried out in Bologna: I have worked
on the VHDL models, while other people worked on the C++ models
of the same blocks. Each block has been designed both in VHDL and
C++, so to allow an easy verification and debugging process.
The main two processing channels are the ones with encoderbo, bar-
rel15, fifonew32x15 and the outmux blocks: these blocks take data
coming from the AMBRA chips, encode them using a lossless compres-
sion algorithm, pack them into 15-bit words and store them in a FIFO
memory before sending them in output to the GOL chip one channel
after the other.
103
2D compression algorithm and implementation
Figure 4.7: CARLOS v3 building blocks
104
4.5 — CARLOS v3 building blocks
The channel containing the ttc-rx-interface and fifo-trigger15x12 re-
ceives trigger numbers (bunch counter and event counter) from the
TTCrx chip and sends them in output at the beginning of each data
packet. The event-counter block is a local event number generator pro-
viding a further information to be added to the event number coming
from the TTCrx chip: this gives us a greater confidence of being able to
reconstruct data and to find errors if present. Then a trigger-interface
block handles the trigger signals L0, L1 and L2 coming from the Cen-
tral Trigger Processor (CTP) through the TTCrx chip. A Command
Mode Control Unit (CMCU ) receives commands issued through the
JTAG port and puts CARLOS in one of some logic states: running,
idle, bist and so on. Finally the BIST blocks on chip are based on a
pseudo-random pattern generator and a signature maker circuit. Next
paragraph contain a detailed description of these blocks.
4.5.1 The channel block
The channel block is the main processing unit contained in CARLOS
for data encoding, packing and storing. It is composed by three blocks:
encoderbo, barrel15 and fifonew32x15. Two identical channel blocks
are hosted on CARLOS v3.
4.5.2 The encoder block
The I/O signals are:
– value: input 8-bit bus;
– value-strobe: input signal;
– ck : input signal;
– reset : input signal;
– data: output 10-bit bus;
– field : output 4-bit bus;
105
2D compression algorithm and implementation
Input range Output code Total
0-1 1 bit + 000 4 bits
2-3 1 LSB bit + 001 4 bits
4-7 2 LSB bits + 010 5 bits
8-15 3 LSB bits + 011 6 bits
16-31 4 LSB bits + 100 7 bits
32-63 5 LSB bits + 101 8 bits
64-127 6 LSB bits + 110 9 bits
128-255 7 LSB bits + 111 10 bits
Table 4.1: Lossless compression algorithm encoding scheme
– valid : output signal.
The encoderbo block encodes 8-bit input data in variable length codes
in the range from 4 to 10 bits long in a completely lossless way. Table
4.1 contains a detailed description of the encoding mechanism. This
encoding scheme provides a compression on input data based on the
knowledge of the statistics of the stream: in fact small-value data are
much more probable than high-value ones. So far most input data will
be reduced from 8 to 4 or 5 bits, providing some degree of compression.
Indeed it is possible that locally, in time, this compressor may provide
an expansion of data: in fact if a long sequence of values greater than
127 occur, the encoderbo block provides as output a stream of 10-bit
data, that have to be temporarily stored in a FIFO buffer. Here is
a description of how the block actually works: when the input signal
value-strobe is high, the 8-bit input value is encoded in the 10-bit output
data and the valid output signal is asserted. The field output signal
is assigned the number of bits actually containing information in the
10-bit data register. The block is synchronous with the rising edge of
the clock, while the reset signal is active high and asynchronous.
106
4.5 — CARLOS v3 building blocks
Figure 4.8: Graphical description of how the barrel shifter works
4.5.3 The barrel15 block
The I/O signals are:
– input : input 8-bit bus;
– sel : input 4-bit bus;
– load : input signal;
– ck : input signal;
– reset : input signal;
– end-trace: input signal;
– output-push: output signal;
– output : output 15-bit bus.
The barrel15 is the block packing the 4 to 10 bits variable length codes
coming from the encoderbo block to a fixed length 15-bit word. Data
are packed as shown in Figure 4.8. The barrel block makes use of two
internal 15-bit registers, so to be able to break an input data in two
pieces without losing any information: when the first word is put in
output by putting the output signal output-push low, the second word
is used to store the input data. The latency of the barrel block is of
107
2D compression algorithm and implementation
2 clock periods: it means that it takes 2 clock periods before a word
is packed by the barrel15 block. When the input signal end-trace is
asserted, meaning that this is the last data belonging to the current
event, the current value in the internal register is put in output even if
it is not completely full: not defined bits are put to 0.
Data coming from the barrel can be easily reconstructed by starting
from the 3 LSBs of the first barrel word containing the information of
how many bits have to be selected on the left side of the code. By
going on in this way from the LSB to the MSB of every valid word, it
is possible to retrieve all the encoded information.
4.5.4 The fifonew32x15 block
The I/O signals are:
– push-req-n: input signal;
– pop-req-n: input signal;
– diag-n: input signal;
– data-in: input 15-bit bus;
– ck : input signal;
– reset : input signal;
– empty : output signal;
– almost-empty : output signal;
– half-full : output signal;
– almost-full : output signal;
– full : output signal;
– error : output signal;
– dataout : output 15-bit bus.
The fifonew32x15 block has the purpose of storing information coming
out from the barrel shifter. The multiplexing scheme that has been
108
4.5 — CARLOS v3 building blocks
chosen cannot avoid the use of buffers before the multiplexer: in fact
since the output data is fairly allocated 50 % of the time to both chan-
nels (one clock period for channel 0, the next clock period for channel
1 and so on) and since the encoding algorithm can locally, in time, be-
have as an expansor, data has to be locally stored before multiplexing.
The only decision that has to be taken is about FIFO dimensions: we
have chosen a FIFO containing 32 words coming from the barrel shifter
(32x15 bits) in order to take into account the worst possible input data
stream. The problem we have faced designing the FIFO block is the
following one: a FIFO is usually composed of a dual port RAM block
plus some logic for implementation of the First In First Out phylosophy.
This is for example what has been done in CARLOS v2. Nevertheless
the CERN library 0.25 µm only provides one size of RAM memories,
that is 64x32 bits size. This block is at least 4 times bigger than the
block dimensions we need (2048 bits versus 480). Beside that it is quite
difficult, if not impossible, to share the same RAM block between two
different FIFO designs: the idea to share the FIFOs of the two channels
is quite difficult to implement since the number of read/write ports has
to be doubled. So far we decided to design a flip-flop based RAM for
the FIFO taken from the “Designer Foundation” library provided to-
gether with our design software Synopsys. This is a library containing
IP (Intellectual Property) blocks ready to be inserted into a design such
as logic and arithmetic blocks, RAMs and application-specific blocks,
for instance for error checking and correction or for a JTAG controller.
The idea is: it is completely useless that every ASIC designer loses
time while designing a block that is necessary to hundreds of other de-
signers in all over the world. With this idea in mind, many IP libraries
have been collected such as the one provided by Synopsys we have been
making use of.
This is the behavior of the fifonew32x15 block: a push is executed when
the push-req-n input is asserted (low) and either the full flag is inactive
(low) or the full flag is active and the pop-req-n input is asserted (low).
So far a push can occur even if the FIFO is full, as long as a pop is
109
2D compression algorithm and implementation
executed in the same cycle period. Asserting push-req-n in either of
the above cases causes the data at the data-in port to be written to
the next available location in the FIFO. A pop operation occurs when
pop-req-n is asserted (LOW), as long as the FIFO is not empty. As-
serting pop-req-n causes the internal read pointer to be incremented on
the next rising edge of ck. Thus the RAM read data must be captured
on the ck following the assertion of pop-req-n. Push and pop can occur
at the same time if there is data in the FIFO, even when the FIFO is
full. In this case first the pop data is captured by the next stage of
logic after the FIFO and then the new data is pushed into the same
location from which the data was popped. So far there is no conflict in
a simultaneous push and pop when the FIFO is full. A simultaneous
push and pop cannot occur when the FIFO is empty since there is no
pop data to prefetch.
The FIFO block contains some important flags such as empty, almost-
full, full. The empty flag indicates that there are no words in the FIFO
available to be popped. The almost-full flag is asserted when there
are no more than 8 empty locations left in the FIFO. This number is
used as a threshold and is very useful for preventing the FIFO from
overflowing. When this flag is asserted the data-stop signal, output
from CARLOS, is sent to the AMBRA chip asking to stop the data
stream transmission. AMBRA requires 3 clock cycles before it actually
stops sending data to CARLOS. So far the threshold level 8 chosen
for the FIFO design has to take into account for these 3 clock periods
delay due to AMBRA and for the latency due to the encoder and barrel
blocks. So far this flag is very useful for managing data transmission
between AMBRA and CARLOS without losing any data. The last flag
full indicates that the FIFO is full and there is no space available for
pushing data. If AMBRA - CARLOS communication works well this
flag should never be asserted. Fig. 4.9 shows the FIFO timing wave-
forms during the push phase, while Fig. 4.10 shows the FIFO timing
waveforms during the pop phase.
110
4.5 — CARLOS v3 building blocks
Figure 4.9: FIFO timing waveforms during the push phase
Figure 4.10: FIFO timing waveforms during the pop phase
4.5.5 The channel-trigger block
The channel-trigger block has the purpose of getting trigger numbers
from the TTCrx chip and store them before they are multiplexed and
sent to the GOL chip. It is composed by two different blocks: the
111
2D compression algorithm and implementation
ttc-rx-interface and the fifo-trigger block.
4.5.6 The ttc-rx-interface block
The I/O signals are:
– TTCready : input signal;
– BCnt : 12-bit input bus;
– BCntLStr : input signal;
– EvCntLStr : input signal;
– EvCntHStr : input signal;
– ck : input signal;
– reset : input signal;
– BCnt-reg : output 12-bit bus;
– EvCntL-reg : output 12-bit bus;
– EvCntH-reg : output 12-bit bus.
The ttc-rx-interface block receives trigger information from the TTCrx
chip when the input signal TTCready coming from the TTCrx chip
is high, meaning that the TTCrx is ready. When BCntStr is high,
the 12-bit input word is fetched in the register BCnt-reg, the same for
EvCntLStr and EvCntHStr for the MSB and LSB of the 24-bit word
event counter. Following a L2accept signal active the values of these
three registers are written into 3 memory locations of the fifo-trigger
block. Since the event can be discarded until the final confirmation
arrives through signal L2accept it is necessary to wait for such a signal
before storing them in the FIFO.
4.5.7 The fifo-trigger block
This block is logically equivalent to the FIFO block except for what
concerns dimensions: its size is 15x12 words. During the transmission
112
4.5 — CARLOS v3 building blocks
of a complete event from AMBRA to CARLOS lasting for 1.6 ms, up
to four events can be stored in the AMBRA chip, so far CARLOS has
to process 4 triplets of incoming signals L0, L1accept and L2accept.
Thus a 15 words deep FIFO is necessary for storing bunch counter and
event counter information concerning 5 consecutive accepted events.
When CARLOS is ready to send a data packet in output, the first 3
trigger words are read and taken to the outmux block. So far a correct
synchronization between data being sent and trigger information is pre-
served. Output flags from the fifo-trigger block empty, almost-full and
full are not used by other blocks as a control since we do not expect to
have a buffer overflow due to the structure of the AMBRA chip.
4.5.8 The event-counter block
The I/O signals are:
– end-trace: input signal;
– ck : input signal;
– reset : input signal;
– event-id : output 3-bit bus.
A local event counting is performed on CARLOS thanks to the event-
counter block. It is a very simple 3-bit counter triggered by the event-
ident signal coming from the outmux block: this signals asserts that an
event has been completely transmitted and a new one can be accepted.
This number is used both in the header and in the footer words for a
safer transmission protocol.
4.5.9 The outmux block
The I/O signals are:
– indat1 : input 15-bit bus;
113
2D compression algorithm and implementation
– indat0 : input 15-bit bus;
– trigger-data: input 12-bit bus;
– reset : input signal;
– ck : input signal;
– gol-ready : input signal;
– fifo-empty : input 2-bit bus;
– half-ladder-id : input 7-bit bus;
– all-fifos-empty : input signal;
– event-id : input 3-bit bus;
– no-input-data: input signal;
– event-identifier : output signal;
– read-data: output 2-bit bus;
– read-trigger : output signal;
– output-strobe: output signal;
– output : output 16-bit bus.
The outmux block is a multiplexing unit for sending in output data
coming from the two main processing channels in an interlaced way,
meaning that during the even clock periods data coming from channel
1 are put in output, while during the odd clock periods data coming
from channel 0 are served.
This is the way the outmux block behaves: as soon as data begin to fill
the two FIFO blocks the outmux block begins to put in output a packet
like the one shown in Fig. 4.11. The first 3 16-bit words contain trigger
informations coming from the trigger channel, the first word contains
the bunch counter, while second and third word contain event counter
MSBs and LSBs respectively. Since trigger informations are 12-bit long
they are added the bits 1011 as MSBs in order to be able to recognize
them easily in a later phase of data reconstruction.
Follow two header words containing the local event-id number and the
114
4.5 — CARLOS v3 building blocks
Figure 4.11: CARLOS v3 data transmission protocol
externally hardwired information half-ladder-id. The MSBs from the
header word are 110.
Headers are followed by an even number of data words containing data
from the two main channels: if a channel has not valid data to send,
the MSB is put to 1 and all the other bits are set to 0, meaning that a
dummy data is sent in output, otherwise the MSB is set to 0 meaning
that the data word is valid.
The data packet is then concluded with the transmission of two footer
words containing the same information of the header regarding the
event-id number and the number of words being sent in output. The
MSBs are set to 1, so to uniquely identify the footer word type.
The outmux block puts in output the 16-bit data words and the signal
output-strobe. When this signal is high, CARLOS is transmitting data
belonging to a packet, while when low CARLOS is not sending useful
115
2D compression algorithm and implementation
information to the GOL chip. When the gol-ready signal coming from
the GOL chip goes low, meaning that it has lost synchronization with
the input clock, CARLOS stops sending data and begins transmission
again only when gol-ready goes high. The outmux block also puts in
output the 2-bit signal read-data that is sent in input to the 2 main
FIFOs as a pop signal and the signal read-trigger sent to the FIFO-
trigger block. The block outmux also asserts the signal event-ident, that
is used as a trigger for the event-counter block. The input signal all-
fifos-empty is a signal that puts an end to the data packet transmission
since the end of an event has been reached: in fact after the occurrence
of the input signals data-end1 and data-end0 high values, CARLOS
waits until both FIFOs get empty in order to assert the all-fifo-empty
signal. This triggers the end of an event transmission.
4.5.10 The trigger-interface block
The I/O signals are:
– reference-count-trigger : input 8-bit bus;
– L0 : input signal;
– L1accept : input signal;
– L2accept : input signal;
– L2reject : input signal;
– dis-trigger : input signal;
– ck : input signal;
– reset : input signal;
– busy : output signal;
– trigger : output signal;
– abort : output signal.
This block accepts as inputs the trigger signals L0, L1accept, L2accept
and L2reject. Follows a brief description of how these signals can be
116
4.5 — CARLOS v3 building blocks
used for accepting or rejecting an event for storage: the L0 signal is
asserted 1.2 µs after the interaction; L1accept signal is asserted 5.5 µs
after the interaction, if it is not asserted in time the event is rejected;
L2accept is asserted after 100 µs from the interaction if the event is
accepted, otherwise a L2reject signal is asserted before 100 µs. It means
that either a L2accept signal or a L2reject signal is asserted.
The trigger-interface block receives these inputs, processes them to
build 3 other signals: trigger, busy and abort. The trigger signal is
L0 delayed of a quantity of clock cycles programmable via JTAG and
is distributed to the PASCAL and AMBRA chips. This is the signal
triggering an event data acquisition on the PASCAL chip.
The busy signal is asserted just after L0, then waits in the active state
until 5.5 µs after the interaction. If the signal L1accept is not asserted,
then busy goes low again, otherwise it stays active until the signal
dis-trigger coming from AMBRA is activated. The meaning is the
following: until PASCAL is transferring data to AMBRA the readout
system is not ready to accept any other trigger signals, that is to acquire
any other data. The time necessary for the transmission of an event
from PASCAL to AMBRA is about 360 µs. Finally the abort signal
that CARLOS sends to AMBRA is asserted when the L1accept signal is
not asserted at the prefixed time or when the L2reject signal is asserted.
The abort signal causes data transmission from PASCAL to AMBRA
to end and data already stored are discarded.
4.5.11 The cmcu block
The I/O signals are:
– tdi : input signal;
– tms : input signal;
– trst : input signal;
– tck : input signal;
117
2D compression algorithm and implementation
Figure 4.12: CMCU logic state diagram
– bist-ok-tcked : input signal;
– bist-failure-tcked : input signal;
– ck : input signal;
– reset : input signal;
– reference-count-trigger : output 8-bit bus;
– tdo: output signal;
– state-tcked : output signal;
– reset-pipe: output signal.
The Command Mode Control Unit (cmcu) is CARLOS internal control
unit remotely controlled via the JTAG port. Serial data coming from
the JTAG pin tdi are packed into 8-bit words and interpreted as a very
simple program containing commands and operands. Fig. 4.12 shows
CARLOS working states reachable using the JTAG port.
At power-on CARLOS is put in an IDLE state in which no calculation
is performed. Then it can be put is a RESET-PIPELINE state in which
118
4.5 — CARLOS v3 building blocks
an internal reset signal is asserted and all registers are initialized. The
following state is the BIST (Built In Self Test) state in which CARLOS
runs an internal test at working speed to check if everything is working
fine or not, then depending on the test results CARLOS enters the
BIST-FAILURE state or BIST-SUCCESS state. In case of success the
8-bit word sent serially as output on tdo is A0, otherwise the word is
55. In the state WRITE-REG CARLOS prepares to write an internal
register with the value read via JTAG in the next state WRITE-REG-
FETCH: this register contains the number of clock cycles of delay to be
applied to the incoming L0 signal before passing it to the AMBRA chip.
If needed, during the READ-REG stage the CARLOS user can read
this value to check that no errors occurred during the writing phase
by means of the tdo output JTAG pin. Then CARLOS can finally
enter the RUNNING stage in which it is able to accept and process
input data streams and to manage the interfaces towards the GOL and
TTCrx chips. When CARLOS is not in RUNNING mode the busy
signal is set high, meaning that no L0 trigger signal is accepted from
the CTP and no data is transmitted to the GOL chip.
4.5.12 The pattern-generator block
The I/O signals are:
– bist-start : input signal;
– ck : input signal;
– reset : input signal;
– data: output 8-bit bus;
– data-valid : output signal;
– data-end : output signal.
The pattern generator block is part of the BIST utility implemented
on CARLOS v3. The BIST [21, 22] is an in-circuit testing scheme for
digital circuits in which both test generation and test verification are
119
2D compression algorithm and implementation
done by circuitry built into the chip itself. BIST schemes offer three
attractive advantages:
1. they offer a solution to the problem of testing large integrated
circuits with limited number of I/O pins;
2. they are useful for high speed testing since they can run at design
speed;
3. they do not require expensive external automatic test equipment
(ATE).
BIST schemes, in the most general sense, can have any of the following
characteristics:
– concurrent or non-concurrent operation: concurrent testing is de-
signed to detect faults during normal circuit operation, while non-
concurrent testing requires that normal operation be suspended
during testing. In CARLOS v3 non-concurrent operation has been
chosen since we decided to use BIST only to check the correct be-
havior of the chip when off-line.
– exhaustive or non-exhaustive test design: an exhaustive test of a
circuit requires that every intended state of circuit be shown to
exist and that all transitions be demonstrated. For large sequen-
tial circuits as CARLOS this is not practical, so we decided to
implement a non-exhaustive testing design.
– deterministic or pseudo-random generation of test vectors: deter-
ministic testing occurs when specific produced vectors have to be
applied, while pseudorandom testing occurs when random-like test
vectors are produced. We chose the pseudo-random generation
since its implementation requires much less area than the deter-
ministic generation. Pseudo-random generation on CARLOS v3
is performed by the pattern generator block.
The pattern generator block provides a set of 200 pseudo-random test
vectors for BIST. These vectors are provided at the same time to both
120
4.5 — CARLOS v3 building blocks
processing channels. The pseudo-random sequence is obtained using
a linear feed-back shift register, that is a very simple structure and it
requires a very small on-chip area.
4.5.13 The signature-maker block
The I/O signals are:
– bist-vector : input 16-bit bus;
– ck : input signal;
– reset : input signal;
– bist-strobe: output signal;
– signature: output 16-bit bus.
The signature maker block performs the signature analysis. In sig-
nature analysis, the test responses of a system are compacted into a
signature using a linear feedback shift register (LFSR). Then the signa-
ture of the device under test is compared with the expected (reference)
signature. If they both match, the device is declared fault free, other-
wise it is declared faulty. Since several thousands of test responses are
compacted into a few bits of signature by a LFSR, there is an informa-
tion loss. As a result some faulty devices may have the same correct
signature. The probability of a faulty device having the same signature
of a working device is called the probability of aliasing. The probabil-
ity of aliasing is shown to be approximately 2−m, where m denotes the
number of bits in the signature.
The signature register implemented on CARLOS is 16 bits wide, so the
probability of aliasing is 2−16. The signature maker block takes the
16-bit bist-vector word coming from the outmux block, performs the
signature analysis, then, when the FIFO have been completely emp-
tied, asserts the bist-strobe signal when the signature value is ready.
121
2D compression algorithm and implementation
Figure 4.13: Digital design flow for CARLOS v3
4.6 Digital design flow for CARLOS v3
Fig. 4.13 shows in some details the digital design flow we have used for
the design of CARLOS v3 with the CERN library 0.25 µm. Since it is
quite a recent library, we had to face some problems: for instance the
small number of standard cells, the lack of 3-state buffers, the lack of
worst-case cell models, the fact that only Verilog models for cells and
not VHDL models were provided and so on.
The reason for these lacks has to be searched in the fact that up to now
very few chips have been realized and tested using this library, so not
so much characterization work could be done.
So far we had to learn how to use the software Cadence Verilog XL for
122
4.7 — CARLOS layout features
post-synthesis simulations, since Synopsys allows to simulate VHDL
models only. Our main difficulty was due to the necessity of using
VHDL-written testbenches for logic simulation and Verilog-written ones
for netlist simulation: this can be very error-prone since it is quite dif-
ficult to exactly match the two models together.
Beside that we had to learn how to use Cadence Silicon Ensemble for
the place and route job. This is really a very difficult job when the
standard cells are not completely characterized. We received a great
help from Marchioro group especially for what concerns the back-end
design flow. They suggested us to follow a completely flat approach to
the problem since the chip is very small: the hierarchical approach, i.e.
design the layout of each block and then route them together is only
worthy when dealing with chip complexities one order of magnitude
greater then ours.
4.7 CARLOS layout features
Fig. 4.14 shows a picture of the final layout of CARLOS v3, as it has
been sent to the foundry. As one can easily observe it is pad-limited,
i.e. the total silicon surface is due to the number of I/O pads (100)
and not to the number of standard cells it contains. Adding some extra
logic would not imply any additional cost if contained in the area that
is now empty. So far we hope that adding the 2D compression logic will
not substantially increase the chip area and, consequently, production
cost. The total area is 16 mm2 corresponding to the minimal size the
silicon wafer was divided into.
CARLOS v3 is fairly a very simple chip if compared to CARLOS v2
with its 300 kgates of logical complexity: in fact it contains only 10
Kgates. Nevertheless it has been designed in order to test our approach
to the new library and to verify that we were able to run through all
the design flow steps. Our final check will be the test of the chip itself
in order to verify that everything was correctly designed, so to have
123
2D compression algorithm and implementation
Figure 4.14: CARLOS v3 layout picture
very clear ideas for the design of the final version of CARLOS.
A specific PCB is in the design phase right now: it will contain only
the connectors for probing with the Tektronics pattern generator and
logic analyzer pods and the chip itself. Differently from CARLOS v2,
the chip will be bonded into a PGA package and inserted on the PCB
using a ZIF socket. This will allow us to test the 100 samples of the
chip by using only a few PCB samples.
124
Chapter 5
Wavelet based compression
algorithm
As an alternative to the 1D and 2D compression algorithms conceived
at the INFN Section of Torino, our group in Bologna decided to study
other compression algorithms that may be used as a second level com-
pressor on SDD data. After studying the main standard compression
algorithms, we decided to focuse on a wavelet-based compression algo-
rithm and its performances when used to compress SDd data.
The wavelet based compression algorithm design can be divided in 4
steps, requiring the use of different software tools:
1. choice of the algorithm main features;
2. optimization of the algorithm with respect to SDD data using the
Matlab Wavelet Toolbox [23];
3. choice of the architecture for the implementation of the algorithm
using Simulink [24];
4. comparison between the wavelet algorithm performances and the
ones implemented on CARLOS prototypes, in terms of compres-
sion ratio and reconstruction error.
125
Wavelet based compression algorithm
5.1 Wavelet based compression algorithm
The idea of compressing SDD data using a multiresolution based com-
pression algorithm comes from the growing success of this technique,
both for uni-dimensional and bi-dimensional signal compression.
Multiresolution analysis gives an equivalent representation of an input
signal in terms of approximation and detail coefficients; these coef-
ficients can then be encoded using standard techniques, such as run
length encoding.
An SDD event, i.e. data coming from a half-SDD, can be analyzed as
a unidimensional data stream of 64k samples or as a bi-dimensional
structure of 256 by 256 elements. So far the first choice we have to
take is whether implementing a 1D or 2D multiresolution analysis.
In 1D analysis the signal can be written as:
S =
s1, s2, . . . , s256︸ ︷︷ ︸1o anode
, s257, s258, . . . , s512︸ ︷︷ ︸2o anode
, . . . , s65281, s65282, . . . , s65536︸ ︷︷ ︸256o anode
(5.1)
In 2D analysis the signal can be written as:
S =
s1,1 s1,2 . . . s1,256
s2,1 s2,2 . . . s2,256
......
. . ....
s256,1 s256,2 . . . s256,256
1o anode
2o anode
...
256o anode
(5.2)
In the case of 1D analysis, once chosen the two decomposition filters
H and G, the multiresolution analysis can be applied with a number
of levels, that is the number of cascadable filters, between 1 and 16.
So far an orthogonal wavelet decomposition C with 64k coefficients is
produced: the ratio of the approximation coefficients ai number to the
detail coefficients di number depends on the number of decomposition
126
5.1 — Wavelet based compression algorithm
levels used:
S =
(s1, . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . , s65536
)0 decomposition levels
C =
a1, . . . . . . . . . , a32768︸ ︷︷ ︸coeffs. app.
, d32769, . . . . . . . . . , d65536︸ ︷︷ ︸coeffs. dett.
1 decomposition level
C =
a1, . . . . . . , a16384︸ ︷︷ ︸coeffs. app.
, d16385, . . . . . . . . . . . . , d65536︸ ︷︷ ︸coeffs. dett.
2 decomposition levels
C =
a1, . . . , a8192︸ ︷︷ ︸coeffs. app.
, d8193, . . . . . . . . . . . . . . . . , d65536︸ ︷︷ ︸coeffs. dett.
3 decomposition levels
......
C =
a1, a2, a3, a4︸ ︷︷ ︸coeffs. app.
, d5, . . . . . . . . . . . . . . . . . . . , d65536︸ ︷︷ ︸coeffs. dett.
14 decomposition levels
C =
a1, a2︸ ︷︷ ︸coeffs. app.
, d3, . . . . . . . . . . . . . . . . . . . . . . , d65536︸ ︷︷ ︸coeffs. dett.
15 decomposition levels
C =
a1︸︷︷︸coeff. app.
, d2, . . . . . . . . . . . . . . . . . . . . . . , d65536︸ ︷︷ ︸coeffs. dett.
16 decomposition levels
In the case of 2D analysis, once chosen the two decomposition filters
H and G, the bi-dimensional decomposition scheme is applied with a
number of levels to be chosen between 1 and 8. First, multiresolution
analysis is applied to each row of the 2D signal, then each column result-
ing from the previous analysis is decomposed using the same number
of levels.
So far the 2D signal (5.2) is transformed into the 2D orthogonal wavelet
decomposition, containing 64k coefficients; even in this case the ratio
of the approximation coefficients number to detail coefficients number
127
Wavelet based compression algorithm
depends on the decomposition levels applied:
S =
s1,1 . . . . . . . . . . . . . . . . . . . . . . . . . s1,256
......
s256,1 . . . . . . . . . . . . . . . . . . . . . . . . . s256,256
0 decomposition levels
C =
a1,1 . . . a1,128 d1,129 . . . d1,256
......
......
a128,1 . . . a128,128 d128,129 . . . d128,256
d129,1 . . . d129,128 d129,129 . . . d129,256
......
......
d256,1 . . . d256,128 d256,129 . . . d256,256
1 decomposition levels
......
C =
a1,1 a1,2 d1,3 . . . . . . . . . . . d1,256
a2,1 a2,2 d2,3 . . . . . . . . . . . d2,256
d3,1 d3,2 d3,3 . . . . . . . . . . . d3,256
......
......
d256,1 d256,2 d256,3 . . . . . . . . . . . d256,256
7 decomposition levels
C =
a1,1 d1,2 . . . . . . . . . . . . . . . . . . d1,256
d2,1 d2,2 . . . . . . . . . . . . . . . . . . d2,256
......
...
d256,1 d256,2 . . . . . . . . . . . . . . . . . . d256,256
8 decomposition levels
Applying multiresolution analysis to SDD data proves to be useful since
approximation coefficients feature high values, since they represent the
signal approximation, while detail coefficients feature values near to 0.
So far, in order to get compression, detail coefficients can be eliminated
without losing significant information on the input signal.
An easy and effective technique for compressing data after multireso-
lution analysis is to put a threshold level over every coefficient ai and
128
5.2 — Multiresolution algorithm optimization
di. What we expect is that approximation coefficients ai remain un-
changed, while detail coefficients di are all put to 0. This is useful since
the long zero sequences coming from the detail coefficients can be fur-
ther compressed using the run length encoding technique.
The multiresolution based compression algorithm described so far is a
lossy technique but it can be used in a lossless way without putting the
threshold on wavelet coefficients.
5.1.1 Configuration parameters of the multireso-
lution algorithm
Some algorithm parameters can be tuned in order to get the best perfor-
mances in terms of compression ratio and reconstruction error. These
parameters are:
– the pair of decomposition filters H and G, used to implement the
multiresolution analysis;
– the number of dimensions used for the analysis: 1D or 2D;
– the number of decomposition levels;
– the threshold value applied to ai and di coefficients.
5.2 Multiresolution algorithm optimization
The multiresolution algorithm optimization has been carried out using
the Wavelet Toolbox from Matlab.
First, the pair of decomposition filters that, with a fixed value of the
threshold, gives the higher number of null coefficients ai and di and the
lower reconstruction error has been chosen; then the other 3 parameters
have been evaluated one after the other for optimization.
129
Wavelet based compression algorithm
5.2.1 The Wavelet Toolbox from Matlab
The Wavelet Toolbox is a collection of functions from Matlab that,
using Matlab line commands and a user-friendly graphical interface,
allows to develop wavelet techniques to be applied to real problems.
In particular the Wavelet Toolbox allowed us to:
– perform the multiresolution analysis of a signal and the corre-
sponding synthesis, using a wide variety of decomposition and
reconstruction filters;
– treat signals as uni-dimensional or bi-dimensional;
– analyze signals on a variable number of levels;
– apply different threshold levels to the coefficients obtained ai and
di.
The wide choice of filters corresponds to the wide number of wavelet
families implemented by the Wavelet Toolbox, shown in Tab. 5.1 and
in Fig. 2.10, Fig. 2.11 and Fig. 2.12.
In particular the Haar family is composed by the wavelet function ψ(x)
Family Name identifier
Haar wavelet ’haar’
Daubechies wavelets ’db’
Symlets ’sym’
Coiflets ’coif’
Biorthogonal wavelets ’bior’
Reverse Biorthogonal wavelets ’rbio’
Table 5.1: Wavelet families used for multiresolution analysis
and its corresponding scale function φ(x), already discussed in Chap-
ter 2. On the other side each Daubechies, Symlets e Coiflets family is
composed by more than a pair of functions ψ(x) and φ(x): Daubechies
family pairs are named db1, . . . , db10, Symlets family pairs are named
sym2, . . . , sym8, while Coiflets family pairs are named coif1, . . . , coif5.
130
5.2 — Multiresolution algorithm optimization
Biorthogonal (bior1.1, . . . , bior6.8) and Reverse Biorthogonal (rbio1.1,
. . . , rbio6.8) are composed by quartets of functions ψ1(x), φ1(x), ψ2(x)
and φ2(x), where, the first pair is used for decomposition and the second
for reconstruction. Using a particular function of the Wavelet Toolbox
which requires the name of the pair of functions ψ(x) and φ(x) chosen
or the name of the quartet ψ1(x), φ1(x), ψ2(x) and φ2(x) when using
Biorthogonal and Reverse Biorthogonal, it is possible to determine the
impulse response representing, respectively, the low pass filter H and
the high pass filter G used for decomposition and the low pass filter H
and high pass filter G, used in the reconstruction stage.
Multiresolution analysis and synthesis are computed as described in
Chapter 3: in particular the analysis step is performed with a con-
volution operation between the input signal and the filters H and G,
followed by decimation, while synthesis is performed with up-sampling,
followed by a convolution operation between the signal and the filters
H and G.
5.2.2 Choice of the filters
In order to choose the best filters H, G, H and G for SDD data com-
pression, 10 64-kbytes SDD events have ben analyzed using the Wavelet
Toolbox using the wavelet families shown in Tab.5.1.
Each signal S, interpreted both as unidimensional as in in Fig. 5.1 and
bi-dimensional as in Fig. 5.2, has been processed in the following way:
– after choosing a pair of functions ψ(x) and φ(x) or the quartet
ψ1(x), φ1(x), ψ2(x), φ2(x), the corresponding filter coefficients H ,
G, H and G have been determined;
– the signal S has been analyzed using the filters H and G obtaining
the decomposition coefficients C;
– a threshold th has been applied to the coefficients C, obtaining
the modified coefficients Cth;
131
Wavelet based compression algorithm
1 2 3 4 5 6
x 104
−20
0
20
d1
−40−20
02040
d2
−50
0
50
d3
−20
0
20
d4
−10
0
10
d5
10
20
30
a5
0
50
100
150
s
Decomposition at level 5 : s = a5 + d5 + d4 + d3 + d2 + d1 .
Figure 5.1: Uni-dimensional analysis on 5 levels of the signal S
– the coefficients Cth have been synthesized into the signal R, using
the filters H and G.
Both in the uni-dimensional and in the bi-dimensional case, the perfor-
mances related to compression have been quantified using the percent-
age P of the number of null coefficients in Cth, while the performances
related to the reconstruction error have been quantified using the root
mean square error E between the original signal S and the signal R,
obtained after the analysis and synthesis of Cth.
In particular, since the total number of elements in Cth is 65536, in
the uni-dimensional case, the parameter P can be expressed in the
following way:
P =100 · (number of null coefficients in Cth)
65536(5.3)
Even the total number of elements in S and in R is 65536, so, if si
is the i-th element of the uni-dimensional signal S and ri is the i-th
132
5.2 — Multiresolution algorithm optimization
Approximation coef. at level 5
Decomposition at level 5
Image Selection
Original Image
50 100 150 200 250
50
100
150
200
250
Synthesized Image
dwt
idwt
Figure 5.2: Bi-dimensional analysis on 5 levels of the signal S
element of R, the parameter E can be expressed in the following way:
E =
√√√√ 1
65536
65536∑i=1
(si − ri)2 (5.4)
In the bi-dimensional case P is calculated in the same way while, nam-
ing si,j as the (i, j)-th element of S and ri,j as the (i, j)-th element of
R, the parameter E can be expressed in the following way:
E =
√√√√ 1
65536
256∑i=1
256∑j=1
(si,j − ri,j)2 (5.5)
Even if the parameters P and E cannot be directly comparable to
the results obtained in the compression algorithms implemented on the
CARLOS prototypes, they give an important indication about the per-
formance of each filter set used during the analysis.
In particular, P gives a rough estimation of how much the coefficients
Cth can be compressed using the run length encoding, while E can
133
Wavelet based compression algorithm
be interpreted as the error introduced in the value associated to each
sample coming the SDD. The analysis results related to 10 SDD events
are shown from Tab. 5.2 to Tab. 5.7. In particular, Tab. 5.2 shows
the parameter P and E values related to a 5-level analysis using the
Haar filter, both in 1D and 2D, with a threshold value th variable in
the range 0-25. The other tables show the P and E values obtained
with a 5-level analysis with a threshold th of 25 using filters belong-
ing to Daubechies (Tab. 5.3), Symlets (Tab. 5.4), Coiflets (Tab. 5.5),
Biorthogonal (Tab. 5.6) and Reverse Biorthogonal (Tab. 5.7) families,
in the 1D and 2D cases. The uncertainties ∆P and ∆E have been
reported in terms of the respective orders of magnitude only, since we
are only looking for an estimation of these values.
An intersting feature emerging from Tab. 5.2 is the progressive increase
of the values P and E with the increase of the threshold values th ap-
plied to the coefficients C.
The trend of P is easy to understand considering that, applying the
threshold th to decomposition coefficients C means putting to 0 all co-
efficients less than th in absolute value: so far the greater the th value,
the greater the parameter P value.
For what concerns E, the greater the th value, the greater the differ-
ences between Cth and the original C and the distortion introduced.
It is to be noticed that for a value of th equal to 0, the parameter
P is 9.12, while the parameter E is 1.26 e-14, that is the percentage
of null coefficients in Cth and the reconstruction error are very small.
This is quite easy to understand for what concerns P since, without
a threshold, the only null coefficients are a very small fraction of the
total number. For what concerns E, avoiding to modify the coefficients
C with the threshold assures a nearly perfect reconstruction of the sig-
nal. The value 1.26 e-14 comes from the finite precision of the machine
performing the analysis and synthesis processes.
134
5.2 — Multiresolution algorithm optimization
Haar
1D 2D
Threshold value th P E P E
0 9.12 1.26 e-14 3.68 2.50 e-14
1 24.68 0.27 22.21 0.28
2 40.01 0.63 42.63 0.75
3 58.60 1.64 56.34 1.19
4 67.08 1.71 67.76 1.67
5 75.56 2.09 75.50 2.09
6 79.87 2.38 80.77 2.44
7 83.56 2.68 84.96 2.77
8 86.71 2.99 88.21 3.08
9 88.82 3.23 90.75 3.36
10 90.70 3.48 92.88 3.63
11 92.21 3.72 94.49 3.87
12 93.20 3.89 95.80 4.08
13 94.16 4.07 96.78 4.26
14 94.81 4.21 97.56 4.42
15 95.33 4.34 98.20 4.57
16 95.72 4.44 98.73 4.71
17 96.03 4.54 99.05 4.80
18 96.20 4.60 99.25 4.86
19 96.41 4.67 99.44 4.93
20 96.54 4.72 99.55 4.97
21 96.62 4.76 99.64 5.01
22 96.69 4.79 99.69 5.03
23 96.73 4.81 99.74 5.05
24 96.76 4.83 99.77 5.07
25 96.79 4.85 99.80 5.09
Table 5.2: Mean values of P and E on 10 SDD events (∆P ≈ ∆E ≈ 0.01):
the analysis has been performed on a 5-level base, using the set of filters
Haar derived from the Haar wavelet.
135
Wavelet based compression algorithm
Daubechies
1D 2D
Filters P E P E
db1 96.79 4.85 99.80 5.09
db2 96.75 4.82 99.63 5.08
db3 96.73 4.81 99.54 5.07
db4 96.73 4.81 99.48 5.07
db5 96.72 4.81 99.33 5.07
db6 96.71 4.81 99.27 5.07
db7 96.72 4.82 99.20 5.07
db8 96.70 4.81 99.08 5.08
db9 96.69 4.81 98.98 5.09
db10 96.68 4.80 98.98 5.09
Table 5.3: Mean values of P ed E on 10 SDD events (∆P ≈ ∆E ≈ 0.01):
the analysis has been performed on a 5-level base, using the set of filters
Daubechies and a threshold level th equal to 25; the values obtained with
db1 are equivalent to the ones obtained with Haar, since the corresponding
filters are equivalent.
Symlets
1D 2D
Filters P E P E
sym2 96.75 4.82 99.63 5.08
sym3 96.73 4.81 99.54 5.07
sym4 96.74 4.82 99.43 5.07
sym5 96.72 4.81 99.38 5.06
sym6 96.73 4.81 99.33 5.07
sym7 96.70 4.80 99.17 5.06
sym8 96.71 4.80 99.11 5.08
Table 5.4: Mean values of P ed E on 10 SDD events (∆P ≈ ∆E ≈ 0.01):
the analysis has been performed on a 5-level base, using the set of filters
Symlets and a threshold value th equal to 25.
136
5.2 — Multiresolution algorithm optimization
Coiflets
1D 2D
Filters P E P E
coif1 96.74 4.82 99.51 5.07
coif2 96.72 4.80 98.32 4.75
coif3 96.72 4.81 99.60 5.06
coif4 96.69 4.80 98.62 5.06
coif5 96.68 4.80 98.29 5.05
Table 5.5: Mean values of P and E on 10 SDD events (∆P ≈ ∆E ≈ 0.01):
the analysis has been performed on a 5-level base, using the set of filters
Coiflets and a threshold value th equal to 25.
Biorthogonal
1D 2D
Filters P E P E
bior1.1 96.79 4.85 99.80 5.09
bior1.3 96.68 4.81 99.48 5.07
bior1.5 96.64 4.82 99.25 5.05
bior2.2 96.28 4.71 98.70 4.94
bior2.4 96.28 4.65 98.56 4.92
bior2.6 96.23 4.62 98.27 4.91
bior2.8 96.21 4.63 97.81 4.91
bior3.1 93.41 5.68 94.15 5.58
bior3.3 94.37 4.84 95.43 5.01
bior3.5 94.70 4.65 96.60 5.10
bior3.7 94.81 4.59 95.13 4.85
bior3.9 94.88 4.56 94.13 4.85
bior4.4 96.75 4.82 99.39 5.07
bior5.5 96.78 4.88 99.46 5.10
bior6.8 96.68 4.79 98.95 5.04
Table 5.6: Mean values of P and E using the Biorthogonal filters
137
Wavelet based compression algorithm
Reverse Biorthogonal
1D 2D
Filters P E P E
rbio1.1 96.79 4.85 99.80 5.09
rbio1.3 96.77 4.85 99.57 5.08
rbio1.5 96.75 4.86 99.39 5.06
rbio2.2 96.78 4.92 96.89 4.58
rbio2.4 96.79 4.88 99.47 5.12
rbio2.6 96.77 4.87 99.32 5.11
rbio2.8 96.78 4.88 99.18 5.12
rbio3.1 96.38 8.67 98.76 11.29
rbio3.3 96.72 5.14 99.29 5.39
rbio3.5 96.76 4.95 99.28 5.18
rbio3.7 96.76 4.92 99.09 5.18
rbio3.9 96.74 4.91 98.97 5.20
rbio4.4 96.68 4.80 99.29 5.06
rbio5.5 93.32 4.63 98.56 4.92
rbio6.8 96.71 4.81 99.10 5.08
Table 5.7: Mean values of P ed E on 10 SDD events (∆P ≈ ∆E ≈0.01): the analysis has been performed on a 5-level base, using a set of
filters Rev. Biorthogonal and a threshold value th equal to 25; the values
obtained with bior1.1 are equivalent to the ones obtained with Haar, since
the corresponding filters are equivalent.
The common feature from Tab. 5.3, Tab. 5.4, Tab. 5.5, Tab. 5.6 and
Tab. 5.7 is the increasing value of P and E with the increase of the th
value.
Nevertheless some wavelet families are better suited than others to the
compression task; by comparing the values obtained for th = 25, it is
evident that the Haar set of filters shows the best performances. In
particular with P = 96.79 and E = 4.85 in the uni-dimensional case
and P = 99.80 and E = 5.09 in the bi-dimensional case, the Haar set
of filters gets the higher percentage of null coefficients with an accept-
138
5.2 — Multiresolution algorithm optimization
Family Set of filters name Filter length
Haar haar 2
Daubechies dbN 2N
Symlets symN 2N
Coiflets coifN 6N
Biorthogonal bior1.1 2
biorN1.N2, N1 6=1,N2 6=1 max(2N1,2N2)+2
Reverse Biorthogonal rbio1.1 2
rbioN1.N2, N1 6=1,N2 6=1 max(2N1,2N2)+2
Table 5.8: Length of filters belonging to different families
able error. The choice of the Haar filters can be supported with other
argomentations too, concerning Haar filter’s length H, G, H and G,
i.e. the number of coefficients which characterize the impulse response.
As shown in Tab. 5.8 filters belonging to the Haar family have the
smallest number of coefficients among filters, obviously together with
the set of filters db1, bior1.1 and rbio1.1. Since the analysis and syn-
thesis processes consist of successive convolutions between the signal
to analyze or synthesize and the respective filters, this small number
of coefficients allows for a higher execution speed of the analysis and
synthesis processes.
5.2.3 Choice of the dimensionality, number of lev-
els and threshold value
Once chosen the Haar set of filters, we studied the effect on the P and E
parameters of dimensionality (1D or 2D), the number of levels used for
decomposition (1,2, . . . ,16 in 1D and 1,2, . . . ,8 in 2D) and the value
of the threshold th.
Tab. 5.9 and Tab. 5.10 show the analysis of the usual 10 SDD events in
139
Wavelet based compression algorithm
1D and 2D; each table also contains the value of P and E for 1, 3 and
5 levels of decomposition and for each level a threshold value between
0 and 25 has been adopted.
The first result is that bi-dimensional analysis produces a higher per-
centage P of null coefficients than the uni-dimensional case; neverthe-
less its E values are also higher.
For instance comparing the P and E values for a threshold value th
of 35 the 1D analysis on 1 level determines P = 50.01 and E = 1.85,
while 2D analysis determines P = 74.96 and E = 3.96; the same 1D
analysis on 3 levels determines P = 87.45 and E = 4.18, versus the
values P = 99.80 and E = 5.09 in the 2D case.
An other result we obtained from the tables is that, once decided
whether to use 1D or 2D analysis, an increase in the number of decom-
position levels determines an increase in the values of the parameters
P and E.
For instance, by comparing values in Tab. 5.9 obtained with th equal to
25, it can be noticed that 1D analysis on 1 level determines P = 50.01
and E = 1.85, on 2 levels P = 87.45 and E = 4.18, while on 3 levels
P = 96.79 and E = 4.85. The same concept holds true for 2D analysis
and synthesis. So far we found out that the optimized version of a
multiresolution analysis based algorithm for SDD data is a 2D analysis
on the maximum number of decomposition levels using the Haar set of
filters.
For what concerns the threshold th, the parameters P and E increase
when th is increased. In order to decide the th value we have to able
to quantify the reconstruction error introduced after wavelet analysis
and to compare it with the compression algorithms implemented on
CARLOS.
140
5.3 — Choice of the architecture
5.3 Choice of the architecture
The precision related to the architecture chosen for the implementation
of the multiresolution analysis can strongly affect the percentage P of
null coefficients and the reconstruction error E. As an example it is
sufficient to apply both the analysis and synthesis processes to an input
signal without any threshold : the reconstruction error E, though very
little, is different from 0, due to the finite precision that our Pentium
II processor used to perform the calculations.
In order to quantify the influence of the architecture on the algorithm
performance we used Simulink, a software tool from Matlab for the
design and simulation of complex systems, and Fixed-Point Blockset
[25] that allows to simulate the performances of a given algorithm when
implemented on different architectures, both in fixed and floating point.
5.3.1 Simulink and the Fixed-Point Blockset
The Fixed-Point Blockset tool [25] is one of the Simulink libraries which
contains blocks performing operations between signals such as sum,
multiplication, convolution and so on, simulating various types of ar-
chitectures, both fixed and floating point. This tool is very useful since
it allows the designer to study the performance of a given algorithm on
different architectures before the actual implementation takes place.
For instance, this tool can be successfully used in order to decide if
a Fourier transform can be implemented with acceptable performance
in a fixed-point DSP (Digital Signal Processor) or it has to be imple-
mented in a floating-point DSP. The difference is relevant especially for
cost reasons, since a floating-point DSP has a much higher cost than
a fixed-point one. We used the Fixed-Point Blockset with the same
purpose of finding the more suitable architecture before actual imple-
mentation.
Among the various floating and fixed-point architectures handled by
141
Wavelet based compression algorithm
the Fixed-Point Blockset, we studied the following ones:
– double precision floating point IEEE 754 standard architecture;
– single precision floating point IEEE 754 standard architecture;
– fractional fixed point.
IEEE 754 standard architecture is one of the most widespread archi-
tectures and it is used in most floating-point processors.
When the double precision is used, the standard architecture requires
a 64-bit word in which 1 bit stands for the sign s, 11 bits for the expo-
nent e and the remaining 52 bits for the mantissa m. The relationship
s e m
b b b b63 62 51 0
between binary and decimal representation is the following one:
valore decimale = (−1)s · (2e−1023)(1.m) , 0 < e < 2047 (5.6)
When the single precision is used, the standard requires a 32-bit word
in which 1 bit stands for the sign s, 8 bits for the exponent e and the
remaining 23 bits for the mantissa m:
s e
b b b b31 30 22 0
m
In this case the relationship between binary and decimal representation
is the following one:
valore decimale = (−1)s · (2e−127)(1.m) , 0 < e < 255 (5.7)
For what concerns the fractional fixed-point architecture, once fixed
the position of the radix point among the 32 bits of the word, the bits
142
5.3 — Choice of the architecture
on the right (b0− bs−1) contain the fractionary part of the number, one
bit on the left (bs) contains the sign of the number and the other guard
bits (bs+1 − b31) on the left of the radix point contain the integer part
of the number.
It is to be noticed that double precision floating point IEEE 754 stan-
31 30 sb bs−1 1b 0bb bs+1b
radix pointguard bits
dard architecture features a precision of 2−52 ≈ 10−16, single precision
IEEE 754 has a precision of 2−23 ≈ 10−7, while fractional fixed point
architecture has a precision of 2−s, i.e. the precision depends on the
number of bits being used for the fractional part of the number. So
far the study of the influence of the fixed fractional architecture on the
multiresolution analysis has been carried on by varying the position of
the radix point among the 32 bit word.
5.3.2 Choice of the architecture
Implementing bi-dimensional multiresolution analysis and synthesis us-
ing Simulink is quite a long job, both in terms of design and simulation
time. So far we decided to implement a uni-dimensional algorithm on
16 decomposition levels, since it is a much quicker and simpler job.
Beside that it gives a rather good estimation on the performances of
the 3 architectures on an algorithm very similar to the one we have
chosen.
The implementation with Simulink of the multiresolution analysis
and synthesis processes is shown in the external blocks in Fig.5.3: the
block on the left performs the 1D analysis of the signal S using the
Haar set of filters, while the block on the right applies a threshold on
the decomposition coefficients and performs the synthesis of the signal
143
Wavelet based compression algorithm
Applic
azi
one
soglia
e16
livelli
dis
inte
si
R
Segnale
Ric
ost
ruito
S
Segnale
D1
D2
D3
D4
D5
D6
D7
D8
D9
D1
0
D11
D1
2
D1
3
D1
4
D1
5
D1
6
A1
6
D1
de
l
D2
de
l
D3
de
l
D4
de
l
D5
de
l
D6
de
l
D7
de
l
D8
de
l
D9
de
l
D1
0d
el
D11
de
l
D1
2d
el
D1
3d
el
D1
4d
el
D1
5d
el
D1
6d
el
A1
6d
el
Dela
y
Se
gn
ale
D1
D2
D3
D4
D5
D6
D7
D8
D9
D1
0
D11
D1
2
D1
3
D1
4
D1
5
D1
6
A1
6
16
livelli
dia
nalis
i
D1
D2
D3
D4
D5
D6
D7
D8
D9
D1
0
D11
D1
2
D1
3
D1
4
D1
5
D1
6
A1
6
Se
gn
ale
Ric
ost
ruito
Figure 5.3: Developed Simulink blocks: from left to right the analysis
block, the delay block and the threshold and synthesis block
144
5.3 — Choice of the architecture
Dettaglio
1
Dettaglio
10
Dettaglio
11
Dettaglio
12
Dettaglio
13
Dettaglio
14
Dettaglio
15
Dettaglio
16
Appro
ssim
azio
ne
16
Dettaglio
2
Dettaglio
3
Dettaglio
4
Dettaglio
5
Dettaglio
7
Dettaglio
8
Dettaglio
9
Dettaglio
6
17
A16
16
D16
15
D15
14
D14
13
D13
12
D12
11
D11
10
D10
9 D9
8 D8
7 D7
6 D6
5 D5
4 D4
3 D3
2 D2
1 D1
Low
_D
ec
Filt
er9
Low
_D
ec
Filt
er8
Low
_D
ec
Filt
er7
Low
_D
ec
Filt
er6
Low
_D
ec
Filt
er5
Low
_D
ec
Filt
er4
Low
_D
ec
Filt
er3
Low
_D
ec
Filt
er2
Low
_D
ec
Filt
er1
5
Low
_D
ec
Filt
er1
4
Low
_D
ec
Filt
er1
3
Low
_D
ec
Filt
er1
2
Low
_D
ec
Filt
er1
1
Low
_D
ec
Filt
er1
0
Low
_D
ec
Filt
er1
Low
_D
ec
Filt
er
Hi_
Dec
Filt
er9
Hi_
Dec
Filt
er8
Hi_
Dec
Filt
er7
Hi_
Dec
Filt
er6
Hi_
Dec
Filt
er5
Hi_
Dec
Filt
er4
Hi_
Dec
Filt
er3
Hi_
Dec
Filt
er2
Hi_
Dec
Filt
er1
5
Hi_
Dec
Filt
er1
4
Hi_
Dec
Filt
er1
3
Hi_
Dec
Filt
er1
2
Hi_
Dec
Filt
er1
1
Hi_
Dec
Filt
er1
0
Hi_
Dec
Filt
er1
Hi_
Dec
Filt
er
2
Dow
nsam
ple
9
2
Dow
nsam
ple
8
2
Dow
nsam
ple
7
2
Dow
nsam
ple
6
2
Dow
nsam
ple
5
2
Dow
nsam
ple
4
2
Dow
nsam
ple
31
2
Dow
nsam
ple
30
2
Dow
nsam
ple
3
2
Dow
nsam
ple
29
2
Dow
nsam
ple
28
2
Dow
nsam
ple
27
2
Dow
nsam
ple
26
2
Dow
nsam
ple
25
2
Dow
nsam
ple
24
2
Dow
nsam
ple
23
2
Dow
nsam
ple
22
2
Dow
nsam
ple
21
2
Dow
nsam
ple
20
2
Dow
nsam
ple
2
2
Dow
nsam
ple
19
2
Dow
nsam
ple
18
2
Dow
nsam
ple
17
2
Dow
nsam
ple
16
2
Dow
nsam
ple
15
2
Dow
nsam
ple
14
2
Dow
nsam
ple
13
2
Dow
nsam
ple
12
2
Dow
nsam
ple
11
2
Dow
nsam
ple
10
2
Dow
nsam
ple
1
2
Dow
nsam
ple
1
Segnale
De
tta
glio
66 D6
Lo
w_
De
cF
ilte
r5
Hi_
De
cF
ilte
r5
2
Do
wn
sa
mp
le11
2
Do
wn
sa
mp
le1
0
Figure 5.4: Zoom on the developed analysis block
145
Wavelet based compression algorithm
Ap
pro
ssim
azi
on
e
De
tta
gli
16
live
llid
irico
str
uzio
ne
1
Se
gn
ale
Ric
ostr
uito
In1
In2
In3
In4
In5
In6
In7
In8
In9
In10
In11
In12
In13
In14
In15
In16
In17
Out1
Out2
Out3
Out4
Out5
Out6
Out7
Out8
Out9
Out1
0
Out1
1
Out1
2
Out1
3
Out1
4
Out1
5
Out1
6
Out1
7
ToW
ork
spa
ce
D1
D2
D3
D4
D5
D6
D7
D8
D9
D10
D11
D12
D13
D14
D15
D16
A16
D1
th
D2
th
D3
th
D4
th
D5
th
D6
th
D7
th
D8
th
D9
th
D10
th
D11
th
D12
th
D13
th
D14
th
D15
th
D16
th
A16
th
Ap
plic
azi
on
eso
glia
D1
D2
D3
D4
D5
D6
D7
D8
D9
D10
D11
D12
D13
D14
D15
D16
A16
Segnale
Ric
ost
ruito
17
A1
6
16
D1
6
15
D1
5
14
D1
4
13
D1
3
12
D1
2
11 D11
10
D1
0
9 D9
8 D8
7 D7
6 D6
5 D5
4 D4
3 D3
2 D2
1 D1
Figure 5.5: Zoom on the developed threshold and synthesis block
146
5.3 — Choice of the architecture
Dettaglio
10
Dettaglio
1
Dettaglio
11
Dettaglio
12
Dettaglio
13
Dettaglio
14
Dettaglio
15
Dettaglio
16
Appro
ssim
azio
ne
16
Dettaglio
9
Dettaglio
8
Dettaglio
7
Dettaglio
6
Dettaglio
5
Dettaglio
4
Dettaglio
3
Dettaglio
2
1
Segnale
Ric
ostr
uito
2
Upsam
ple
9
2
Upsam
ple
8
2
Upsam
ple
7
2
Upsam
ple
6
2
Upsam
ple
5
2
Upsam
ple
4
2
Upsam
ple
31
2
Upsam
ple
30
2
Upsam
ple
3
2
Upsam
ple
29
2
Upsam
ple
28
2
Upsam
ple
27
2
Upsam
ple
26
2
Upsam
ple
25
2
Upsam
ple
24
2
Upsam
ple
23
2
Upsam
ple
22
2
Upsam
ple
21
2
Upsam
ple
20
2
Upsam
ple
2
2
Upsam
ple
19
2
Upsam
ple
18
2
Upsam
ple
17
2
Upsam
ple
16
2
Upsam
ple
15
2
Upsam
ple
14
2
Upsam
ple
13
2
Upsam
ple
12
2
Upsam
ple
11
2
Upsam
ple
10
2
Upsam
ple
1
2
Upsam
ple
Low
_R
ec
Filt
er9
Low
_R
ec
Filt
er8
Low
_R
ec
Filt
er7
Low
_R
ec
Filt
er6
Low
_R
ec
Filt
er5
Low
_R
ec
Filt
er4
Low
_R
ec
Filt
er3
Low
_R
ec
Filt
er2
Low
_R
ec
Filt
er1
6
Low
_R
ec
Filt
er1
5
Low
_R
ec
Filt
er1
4
Low
_R
ec
Filt
er1
3
Low
_R
ec
Filt
er1
2
Low
_R
ec
Filt
er1
1
Low
_R
ec
Filt
er1
0
Low
_R
ec
Filt
er1
Hi_
Rec
Filt
er9
Hi_
Rec
Filt
er8
Hi_
Rec
Filt
er7
Hi_
Rec
Filt
er6
Hi_
Rec
Filt
er5
Hi_
Rec
Filt
er4
Hi_
Rec
Filt
er3
Hi_
Rec
Filt
er2
Hi_
Rec
Filt
er1
6
Hi_
Rec
Filt
er1
5
Hi_
Rec
Filt
er1
4
Hi_
Rec
Filt
er1
3
Hi_
Rec
Filt
er1
2
Hi_
Rec
Filt
er1
1
Hi_
Rec
Filt
er1
0
Hi_
Rec
Filt
er1
Fix
Pt
Sum
9
Fix
Pt
Sum
8
Fix
Pt
Sum
7
Fix
Pt
Sum
6
Fix
Pt
Sum
5
Fix
Pt
Sum
4
Fix
Pt
Sum
3
Fix
Pt
Sum
2
Fix
Pt
Sum
16
Fix
Pt
Sum
15
Fix
Pt
Sum
14
Fix
Pt
Sum
13
Fix
Pt
Sum
12
Fix
Pt
Sum
11
Fix
Pt
Sum
10
Fix
Pt
Sum
1
17
A16
16
D16
15
D15
14
D14
13
D13
12
D12
11
D11
10
D10
9 D9
8 D8
7 D7
6 D6
5 D5
4 D4
3 D3
2 D2
1 D1
2
Upsam
ple
9
2
Upsam
ple
11
2
Upsam
ple
10
Low
_R
ec
Filte
r5
Hi_
Rec
Filte
r5
Fix
Pt
Sum
8
Fix
Pt
Sum
7
6
D6
Figure 5.6: Zoom on the developed synthesis block
147
Wavelet based compression algorithm
R.
The analysis block has been implemented as a 16-level cascade, see
Fig. 5.4, containing high-pass filter operators (Hi Dec Filter), low pass
filter operators (Low Dec Filter) and Downsample operators. Hi Dec
Filter operators perform convolution between the incoming signal and
the Haar high pass decomposition filter, Low Dec Filter operators per-
form convolution between the incoming signal and the Haar low pass
decomposition filter, while the Downsample operators perform the dec-
imation of the incoming signal.
Fig. 5.5 shows the threshold and synthesis block which is subdivided
into 3 major sub-blocks: the sub-block on the left applies a threshold
on the input stream, the sub-block on the right performs the synthesis
of the signal, while the central block, called To Workspace, stores the
decomposition coefficients after the application of the threshold, so that
this value is used for calculating the percentage P of null coefficients.
The synthesis block has been implemented, in analogy to the analysis
block, as a 16-level cascade, see Fig. 5.6, containing Hi Rec Filter oper-
ators performing the convolution between the incoming signal and the
Haar high-pass reconstruction filter, Low Rec Filter operators perform-
ing the convolution between the incoming signal and the Haar low-pass
reconstruction filter, FixPt Sum operators performing the sum between
filtered signals and Upsample operators performing the upsampling on
the incoming signals.
Finally the Delay block shown in Fig. 5.3 is the block with the task of
starting the synthesis process only when the analysis job has already
been completed. It is to be noticed that the analysis, delay and synthe-
sis blocks have been developed starting from simple blocks belonging
to the Fixed Point Blockset, such as filtering, downsampling and up-
sampling blocks, and so on.
After performing the analysis and synthesis of the 10 SDD events with
a value of the threshold equal to 25 for the 3 architectures described
above, we have obtained the values shown in Tab. 5.11; as a notation
the floating point double precision standard architecture IEEE 754 is
148
5.4 — Multiresolution algorithm performances
indicated as ieee754doub, the single precision floating point standard
architecture IEEE 754 as ieee754sing and the fractional fixed point ar-
chitecture as fixed(s), where s is the number of bits representing the
fractional part of the number.
Simulink simulations show how the values P and E depend on the
precision of the selected architecture: in particular taking as a refer-
ence the values P and E less influenced from the finite precision of the
calculations, i.e. the values related to the architecture ieee754doub, it
can be noticed in the cases ieee754sing, fixed(18), fixed(15), fixed(12)
and fixed(9), a slight increase in the error E while P remains constant,
while in cases fixed(7), fixed(5) and fixed(3) the discrepancy with the
values obtained in the case ieee754doub increases strongly.
So far the results we have obtained pointed us towards the choice of
one of the following architectures: ieee754doub, ieee754sing, fixed(18),
fixed(15), fixed(12) and fixed(9). Our choice fell on the ieee754sing as
explained in Par. 5.5.
5.4 Multiresolution algorithm performances
For a direct comparison of the performances obtained by the compres-
sion algorithms implemented on the CARLOS prototypes and by the
multiresolution based algorithm, we developed a FORTRAN subrou-
tine running analysis and synthesis on a floating-point single precision
SPARC5 processor. The FORTRAN subroutine can be logically di-
vided in two parts: the first with the aim of giving an estimation of the
algorithm in terms of compression, the second with the aim of giving
an estimation of the reconstruction error on the cluster charge.
The first part of the subroutine performs analysis, threshold th appli-
cation and synthesis on SDD events containing several charge clusters.
After applying analysis and threshold, for each SDD event the recip-
rocal of the compression ratio is calculated c−1 = no output bitsno input bits
, with
the assumption that each non-null decomposition coefficient is encoded
149
Wavelet based compression algorithm
using two 32-bit words, one representing the value of the coefficient
itself, the other representing the number of null coefficients between
the current and the previous non-null coefficient. So far the number of
bits entering the algorithm is the number of samples multiplied by 8
bits (64k × 8 = 512k), while the number of bits exiting the algorithm
is the number of non-null decomposition coefficients multiplied by the
32 + 32 = 64 bits used to encode each coefficient.
The second part of the FORTRAN subroutine performs analysis, thresh-
old application and synthesis to single-cluster SDD events.
After analysis, threshold th application and synthesis, the difference
between the coordinates of the cluster charge before compression and
after synthesis is computed for each SDD event, as long as the percent-
age difference between the charge of the cluster before compression and
after reconstruction.
Fig. 5.7, Fig. 5.8, Fig. 5.9, Fig. 5.10, Fig. 5.11 and Fig. 5.12 show the
value of the compression parameter c−1 for different threshold th val-
ues; in each figure the upper histogram represent the c values belonging
to 500 SDD events analyzed, while the lower hystogram represents the
c values related to SDD events whose c−1 value is less than 46× 10−3
(c = 22).
As shown in hystograms, the mean c values are lower than our target
value c−1 = 46 × 10−3 for each threshold value selected. So far the
multiresolution algorithms can reach an acceptable compression ratio
by putting a threshold of 20 on analyzed coefficients.
For what concerns the reconstruction error calculation up to now we
could use only 20 single-cluster events. So far the hystograms reporting
coordinate and charge difference before and after compression show a
very poor statistics.
For this reason the results we obtained on reconstruction error are
pretty qualitative up to now: in particular performing the analysis on
20 SDD events and using a threshold th level equal to 21 the differ-
ences on the centroid coordinates before and after compression are of
the order of magnitude of the µm, whereas the difference between clus-
150
5.5 — Hardware implementation
ter charge show a cluster underestimation of some percentual point.
These qualitative results belong to the same order of magnitude of the
compression algorithms implemented in CARLOS prototypes.
Figure 5.7: c−1 values for th=20
5.5 Hardware implementation
The hardware we have chosen for the implementation of the wavelet
based compression algorithm is a DSP chip from Analog Devices (AD):
the ADSP-21160. The DSP belongs to the Single Instruction Multiple
Data SHARC family produced by AD. It performs calculations both
in fixed-point and in single precision floating point at the same speed.
Our choice fell on this DSP also for this interesting feature, since it
allows us to try two different architectures with a single chip. The chip
has the following features:
– 600 MFLOPS (32-bit floating point) peak operation;
151
Wavelet based compression algorithm
Figure 5.8: c−1 values for th=21
Figure 5.9: c−1 values for th=22
152
5.5 — Hardware implementation
Figure 5.10: c−1 values for th=23
Figure 5.11: c−1 values for th=24
153
Wavelet based compression algorithm
Figure 5.12: c−1 values for th=25
– 600 MOPS (32-bit fixed point) peak operation;
– 100 MHz core operation;
– 4 Mbits on-chip dual-ported SRAM;
– division of SRAM between program and data memory is user se-
lectable;
– 14 channels of zero overhead DMA;
– JTAG standard test access port.
Particularly interesting in this chip is the amount of memory hosted on-
chip: 4 Mbits are sufficient to store the algorithm program and at least
2 SDD events (each one requires 512 Kbits). So far while processing
one SDD event, an other one can be fetched into the internal SRAM
using the DMA channels, so increasing the total throughput.
The DSP has been bought together with an evaluation board and an
integrated development environment software VisualDSP, that allows
to write C code and download it to the DSP chip. The wavelet based
154
5.5 — Hardware implementation
compression algorithm implementation on DSP is still in the design
phase, so far no data concerning algorithm speed are available up to
now for a quantitative comparison with the CARLOS chip prototypes.
155
Wavelet based compression algorithm
Haar
1D
1 level 3 levels 5 levels
Threshold value th P E P E P E
0 7.78 3.02 e-15 9.05 7.11 e-15 9.12 1.26 e-14
1 17.51 0.22 23.67 0.26 24.68 0.27
2 31.23 0.65 38.11 0.62 40.01 0.63
3 40.09 1.01 55.81 1.21 58.60 1.64
4 44.28 1.25 63.48 1.56 67.08 1.71
5 47.84 1.52 71.20 2.00 75.56 2.09
6 48.78 1.61 74.80 2.26 79.87 2.38
7 49.31 1.68 77.81 2.52 83.56 2.68
8 49.71 1.74 80.38 2.79 86.71 2.99
9 49.78 1.76 82.02 2.99 88.82 3.23
10 49.87 1.78 83.41 3.19 90.70 3.48
11 49.91 1.79 84.50 3.38 92.21 3.72
12 49.94 1.80 85.17 3.50 93.20 3.89
13 49.97 1.81 85.81 3.64 94.16 4.07
14 49.98 1.82 86.25 3.75 94.81 4.21
15 49.98 1.83 86.60 3.84 95.33 4.34
16 49.99 1.83 86.85 3.92 95.72 4.44
17 50.00 1.84 87.02 3.98 96.03 4.54
18 50.00 1.84 87.12 4.02 96.20 4.60
19 50.00 1.84 87.24 4.07 96.41 4.67
20 50.00 1.84 87.32 4.10 96.54 4.72
21 50.01 1.84 87.36 4.12 96.62 4.76
22 50.01 1.84 87.40 4.14 96.69 4.79
23 50.01 1.84 87.42 4.16 96.73 4.81
24 50.01 1.85 87.43 4.17 96.76 4.83
25 50.01 1.85 87.45 4.18 96.79 4.85
Table 5.9: Mean values of P ed E on 10 SDD events (∆P ≈ ∆E ≈ 0.01):
the analysis has been performed on a number of levels equal to 1, 3, 5,
using the Haar set of filters.
156
5.5 — Hardware implementation
Haar
2D
1 level 3 levels 5 levels
Threshold value th P E P E P E
0 3.54 5.32 e-15 3.67 1.5 e-14 3.68 2.50 e-14
1 18.90 0.26 22.06 0.28 22.21 0.28
2 36.05 0.69 42.33 0.74 42.63 0.75
3 46.42 1.07 55.90 1.19 56.34 1.19
4 55.25 1.47 67.15 1.66 67.76 1.67
5 60.69 1.80 74.78 2.07 75.50 2.09
6 64.01 2.06 79.95 2.42 80.77 2.44
7 66.46 2.30 84.03 2.75 84.96 2.77
8 68.30 2.51 87.18 3.05 88.21 3.08
9 69.73 2.70 89.64 3.33 90.75 3.36
10 70.95 2.90 91.72 3.59 92.88 3.63
11 71.87 3.06 93.25 3.82 94.49 3.87
12 72.63 3.22 94.51 4.03 95.80 4.08
13 73.20 3.35 95.46 4.21 96.78 4.26
14 73.65 3.47 96.21 4.36 97.56 4.42
15 74.06 3.59 96.84 4.51 98.20 4.57
16 74.38 3.69 97.34 4.64 98.73 4.71
17 74.53 3.75 97.63 4.72 99.05 4.80
18 74.65 3.80 97.82 4.79 99.25 4.86
19 74.76 3.85 98.01 4.85 99.44 4.93
20 74.82 3.87 98.11 4.89 99.55 4.97
21 74.87 3.90 98.20 4.93 99.64 5.01
22 74.91 3.92 98.25 4.95 99.69 5.03
23 74.93 3.94 98.29 4.97 99.74 5.05
24 74.94 3.95 98.32 4.99 99.77 5.07
25 74.96 3.96 98.35 5.00 99.80 5.09
Table 5.10: Mean values of P ed E on 10 SDD events (∆P ≈ ∆E ≈ 0.01):
the analysis has been performed on a number of levels equal to 1, 3, 5, using
the Haar set of filters.
157
Wavelet based compression algorithm
Architecture Precision P E
ieee754doub 2−52 99.88 5.07
ieee754sing 2−23 99.88 5.11
fixed(18) 2−18 99.88 5.11
fixed(15) 2−15 99.88 5.11
fixed(12) 2−12 99.88 5.11
fixed(9) 2−9 99.88 5.11
fixed(7) 2−7 99.87 6.04
fixed(5) 2−5 99.81 12.75
fixed(3) 2−3 99.52 89.09
Table 5.11: Mean values of P ed E on 10 SDD events (∆P ≈ ∆E ≈ 0.01),
obtained with Simulink simulations
158
Conclusions
The main goal of this thesis work was the search for compression algo-
rithms and its hardware implementation to be applied to data coming
out from the Silicon Drift Detectors in the ALICE experiment.
ALICE and, in general, LHC experiments put very stringent constraints
on the compression algorithms for what concerns compression ratio,
reconstruction error, speed, flexibility and so on. For example data
produced by the SDD have to be reduced of a factor of 22 in order
to satisfy the constraints on disk space for permanent storage. So far
many standard compression algorithms have been studied in order to
find which one could obtain the best trade-off between compression
ratio and reconstruction error, i.e. distortion introduced. It is rather
obvious, in fact, that a high compression ratio such as 22 can only be
achieved at the expense of some loss of information on the physical
charge distribution over the SDD surface.
Three hardware prototypes implementing data compression are pre-
sented in the thesis: the front-end chip CARLOS v1, v2 and v3. Their
evolution from version 1 to version 3 reflects the architectural changes
in the readout chain occurred during the 3 years of the work. Three
major reasons can be used to justify these changes:
– the necessity to work in a radiation environment, forcing us to
choose a radiation-tolerant technology;
– the lack of space for the SIU board, forcing us to change the
readout architecture;
159
CONCLUSIONS
– the change from a uni-dimensional (1D) compression algorithm to
a bi-dimensional one (2D), in order to have the same compression
ratio as in 1D, while using lower thresholds, thus losing a smaller
amount of physical data.
We plan that CARLOS v4 will be the final version of the chip: it will
contain the 2D algorithm and will be designed to be compliant with
the new readout architecture. It should be sent to the foundry before
the end of 2002.
One of the main features of these chips is that lossy compression can be
switched off when needed and turned to lossless compression. Lossless
data compression becomes necessary if compression algorithms imple-
mented on the CARLOS chips are no longer applicable. For exam-
ple the 2D compression algorithm does not work fine in presence of a
slope on the anodic signal baseline. In this case on-line compression on
the front-end has to be switched off and a second level compressor in
counting room has to do the job. For this kind of application different
compression algorithms have to be studied.
In alternative to the 1D and 2D algorithms, our group in Bologna
decided to study a wavelet based compression algorithm, in order to
decide if it could be useful for a possible second level data compression.
Our simulations proved that the algorithm show good performances
for what concerns both the compression ratio and the reconstruction
error. We are still working in order to obtain some more quantitative
results and, at the same time, an implementation on DSP is planned for
the near future in order to evaluate compression speed and how many
DSPs would be necessary for the task. The use of DSP in counting
room may be very convenient since, differently from ASICs, they are
completely reprogrammable via software if needed. So far as many as
different compression algorithms as wanted can be tried on the input
data in order to find the best one.
160
Bibliography
[1] ALICE Collaboration, “Technical Proposal for A Large Ion
Collider Experiment at the CERN LHC”, December 1995,
CERN/LHCC/95-71.
[2] The LHC study group, “The Large Hadron Collider Conceptual
Design”, October 1995, CERN/AC/95-05(LHC).
[3] P. Giubellino, E. Crescio, “The ALICE experiment at LHC:
physics prospects and detector design”, January 2001, ALICE-
PUB-2000-35.
[4] CERN/LHCC 99-12 ALICE TDR 4, 18 June 1999.
[5] E. Crescio, D. Nouais, P. Cerello, “A detailed study of charge diffu-
sion and its effect on spatial resolution in Silicon Drift Detectors”,
September 2001, ALICE-INT-2001-09.
[6] F. Faccio, K. Kloukinas, G. Magazzu, A. Marchioro, “SEU
effects in registers and in a Dual-Ported Static RAM designed in
a 0.25 µm CMOS technology for applications in the LHC”, Fifth
Workshop on Electronics for LHC Experiments, September 20-24,
1999, pages 571-575.
[7] K. Sayood, “Introduction to Data Compression”, Morgan Kauf-
mann, S. Francisco, 1996.
[8] E.S. Ventsel “Teoria delle probabilita”, Mir edition.
[9] S. W. Smith, “The Scientist and Engineer’s Guide to Digital Signal
Processing”, California Technical Publishing, S. Diego, 1999.
161
BIBLIOGRAPHY
[10] J. Badier, Ph. Busson, A. Karar, D.W. Kim, G.B. Kim,, S.C.
Lee, “Reduction of ECAL data volume using lossless data com-
pression techniques”, Nuclear Instruments and Methods in Physics
Research A 463 (2001), pages 361-374.
[11] R. Polikar, “The Engineer’s ultimate guide to wavelet analysis”,
http://engineering.rowan.edu/˜polikar/WAVELETS/WTtutorial.html,
2001.
[12] P. G. Lemarie, Y.Meyer, “Ondelettes et bases hilbertiennes”, Riv-
ista Matematica Iberoamericana, Vol. 2, pages 1-18, 1986.
[13] E. J. Stollnitz, T. D. DeRose e D. H. Salesin, “Wavelets for
computer graphics: a primer”, IEEE Computer Graphics and Ap-
plications, Vol. 3, NO. 15, pages 76-84, May 1995 (part 1) and
Vol. 4, NO. 15, pages 75-85, July 1995 (part 2). Vol. 3, NO. 15,
pages 76-84, May 1995.
[14] P. Morton, “Image Compression Us-
ing the Haar Wavelet Transform”,
http://online.redwoods.cc.ca.us/instruct/darnold/maw/haar.htm,
1998.
[15] B. Burke Hubbard, “The World According to Wavelets: the story
of a mathematical technique in the making”, A K Peters, Ltd.,
Wellesley, 1998.
[16] S. G. Mallat, “A Theory for Multiresolution Signal Decomposition:
The Wavelet Representation”, IEEE Transactions on pattern anal-
ysis and machine intelligence, Vol. II, NO. 7, pages 674-693, July
1989.
[17] D. Cavagnino, P. De Remigis, P. Giubellino, G. Mazza, e
A. E. Werbrouck, “Data Compression for the ALICE Silicon Drift
Detector”, 1998, ALICE-INT-1998-41.
[18] Pankaj Gupta and Nick McKeown, “Designing and Implementing
a Fast Crossbar Scheduler“, Jan/Feb 1999, IEEE Micro.
162
BIBLIOGRAPHY
[19] D. Cavagnino, P. Giubellino, P. De Remigis, A. Werbrouck, G.
Alberici, G. Mazza, A. Rivetti, F. Tosello, “Zero suppression and
Data Compression for SDD Output in the ALICE Experiment”,
Internal note/SDD, ALICE-INT-1999-28 V 1.0.
[20] P. Moreira, J. Christiansen, A. Marchioro, E. van der Bij, K.
Kloukinas, M. Campbell, G. Cervelli, “A 1.25 Gbit/s Serializer
for LHC Data and Trigger Optical Links”, Fifth Workshop on
Electronics for LHC Experiments, September 20-24, 1999, pages
194-198.
[21] F. Wang, “BIST using pseudorandom test vectors and signa-
ture analysis”, IEEE 1988 Custom Integrated Circuits Conference,
CH2584-1/88/0000-0095.
[22] T.W. Williams, W. Daehn, “Aliasing errors in multiple input sig-
nature analysis registers”, 1989 IEEE, CH2696-3/89/0000/0338.
[23] M. Misiti, Y. Misiti, G. Oppenheim and J. M. Poggi, “Wavelet
Toolbox User’s Guide”, The MathWorks, Inc., Natick, 2000.
[24] “Simulink User’s Guide: Dynamic System Simulation for Matlab”,
The MathWorks, Inc., Natick, 2000.
[25] “Fixed-Point Blockset User’s Guide: for Use with Simulink”, The
MathWorks, Inc., Natick, 2000.
163