Observational Astrophysics II - SRONkuiper/hea_inst11/handout_4nov2011/OAF2… · 1.1 Stochastic...

Observational Astrophysics II

JOHAN BLEEKERSRON Netherlands Institute for Space Research

Astronomical Institute Utrecht

FRANK VERBUNTAstronomical Institute Utrecht

January 18, 2007

Contents

1 Radiation Fields 41.1 Stochastic processes: distribution functions, mean and variance . . . . . . . . . 41.2 Autocorrelation and autocovariance . . . . . . . . . . . . . . . . . . . . . . . . 51.3 Wide-sense stationary and ergodic signals . . . . . . . . . . . . . . . . . . . . . 61.4 Power spectral density . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71.5 Intrinsic stochastic nature of a radiation beam: Application of Bose-Einstein

statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101.5.1 Intermezzo: Bose-Einstein statistics . . . . . . . . . . . . . . . . . . . . 101.5.2 Intermezzo: Fermi Dirac statistics . . . . . . . . . . . . . . . . . . . . . 13

1.6 Stochastic description of a radiation field in the thermal limit . . . . . . . . . 151.7 Stochastic description of a radiation field in the quantum limit . . . . . . . . . 20

2 Astronomical measuring process: information transfer 232.1 Integral response function for astronomical measurements . . . . . . . . . . . . 232.2 Time filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

2.2.1 Finite exposure and time resolution . . . . . . . . . . . . . . . . . . . . 242.2.2 Error assessment in sample average μT . . . . . . . . . . . . . . . . . . . 25

2.3 Data sampling in a bandlimited measuring system: Critical or Nyquist frequency 28

3 Indirect Imaging and Spectroscopy 313.1 Coherence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

3.1.1 The Visibility function . . . . . . . . . . . . . . . . . . . . . . . . . . . . 313.1.2 Young’s dual beam interference experiment . . . . . . . . . . . . . . . . 313.1.3 The mutual coherence function . . . . . . . . . . . . . . . . . . . . . . . 323.1.4 Interference law for a partially coherent radiation field: the complex

degree of coherence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 343.2 Indirect spectroscopy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

3.2.1 Temporal coherence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 363.2.2 Longitudinal correlation . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

3.3 Indirect imaging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 393.3.1 Quasi-monochromatic point source: spatial response function of a two-

element interferometer . . . . . . . . . . . . . . . . . . . . . . . . . . . . 393.3.2 Quasi-monochromatic extended source: spatial or lateral coherence . . . 443.3.3 The Van Cittert-Zernike theorem . . . . . . . . . . . . . . . . . . . . . . 453.3.4 Etendue of coherence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

3.4 Aperture synthesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 503.4.1 Quasi-monochromatic point source: spatial response function (PSF)

and optical tranfer function (OTF) of a multi-element interferometer . . 50

1

3.4.2 Earth Rotation Aperture Synthesis (ERAS) . . . . . . . . . . . . . . . . 563.4.3 The Westerbork Radio Synthesis Telescope (WSRT) . . . . . . . . . . . 573.4.4 Maximum allowable bandwidth of a quasi-monochromatic source . . . . 613.4.5 The general case: arbitrary baselines and a field of view in arbitrary

direction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

4 Radiation sensing: a selection 644.1 General . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 644.2 Heterodyne detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 644.3 Frequency limited noise in the thermal limit: signal to noise ratio and limiting

sensitivity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 674.4 Photoconductive detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

4.4.1 Operation principle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 734.4.2 Responsivity of a photoconductor . . . . . . . . . . . . . . . . . . . . . . 734.4.3 Temporal frequency response of a photoconductor . . . . . . . . . . . . 76

4.5 Frequency limited noise in the quantum limit . . . . . . . . . . . . . . . . . . . 774.5.1 Frequency limited shot noise in the signal-photon limit . . . . . . . . . . 774.5.2 Example: Shot noise in the signal-photon limit for a photoconductor . . 794.5.3 Shot noise in the background-photon limit for a photoconductor: the

normalized detectivity D∗ . . . . . . . . . . . . . . . . . . . . . . . . . . 804.6 Charge Coupled Devices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81

4.6.1 Operation principle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 814.6.2 Charge storage in a CCD . . . . . . . . . . . . . . . . . . . . . . . . . . 814.6.3 Charge transport in a CCD . . . . . . . . . . . . . . . . . . . . . . . . . 844.6.4 Charge capacity and transfer speed in CCD structures . . . . . . . . . . 874.6.5 Focal plane architectures . . . . . . . . . . . . . . . . . . . . . . . . . . 884.6.6 Wavelength response of CCDs . . . . . . . . . . . . . . . . . . . . . . . . 91

5 Fitting observed data 955.1 Errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95

5.1.1 Accuracy versus precision . . . . . . . . . . . . . . . . . . . . . . . . . . 955.1.2 Computing the various distributions . . . . . . . . . . . . . . . . . . . . 97

5.2 Error propagation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 995.2.1 Examples of error propagation . . . . . . . . . . . . . . . . . . . . . . . 100

5.3 Errors distributed as a Gaussian and the least squares method . . . . . . . . . 1015.3.1 Weighted averages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1025.3.2 Fitting a straight line . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1035.3.3 Non-linear models: Levenberg-Marquardt . . . . . . . . . . . . . . . . . 1055.3.4 Optimal extraction of a spectrum . . . . . . . . . . . . . . . . . . . . . . 106

5.4 Errors distributed according to a Poisson distribution and Maximum likelihood 1075.4.1 A constant background . . . . . . . . . . . . . . . . . . . . . . . . . . . 1075.4.2 Constant background plus one source . . . . . . . . . . . . . . . . . . . 108

5.5 General Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1085.5.1 Amoebe . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1085.5.2 Genetic algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109

5.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109

6 Looking for variability and periodicity 1126.1 Fitting sine-functions: Lomb-Scargle . . . . . . . . . . . . . . . . . . . . . . . . 1126.2 Period-folding: Stellingwerf . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1146.3 Variability through Kolmogorov-Smirnov tests . . . . . . . . . . . . . . . . . . 1146.4 Fourier transforms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115

6.4.1 The discrete Fourier transform . . . . . . . . . . . . . . . . . . . . . . . 1166.4.2 From continuous to discrete . . . . . . . . . . . . . . . . . . . . . . . . . 1186.4.3 The statistics of power spectra . . . . . . . . . . . . . . . . . . . . . . . 1206.4.4 Detecting and quantifying a signal . . . . . . . . . . . . . . . . . . . . . 121

6.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121

Chapter 1

Radiation Fields

1.1 Stochastic processes: distribution functions, mean andvariance

A stochastic process is defined as an infinite series of stochastic variables, one for each value ofthe time t. For a specific value of t, the stochastic variable X(t) possesses a certain probabilitydensity distribution. The numerical value of the stochastic variable X(t) at time t correspondsto a particular draw (outcome) from this probability distribution at time t. The time seriesof draws represents a single time function and is commonly referred to as a realisation of thestochastic process. The full set of all realisations is called the ensemble of time functions (seeFig. 1.1).

At each time t, the stochastic variable X(t) describing the stochastic process, is distributed

Figure 1.1: The ensemble of time functions of a stochastic process.

4

by a momentary cumulative distribution, called the first-order distribution:

F (x; t) = P{X(t) ≤ x} (1.1)

which gives the probability that the outcome at time t will not exceed the numerical value x.The probability density function, or first-order density, of X(t) follows from the derivative ofF (x; t):

f(x; t) ≡ ∂F (x; t)∂x

(1.2)

This probability function may be a binomial, Poisson or normal (Gaussian) distribution.For the determination of the statistical properties of a stochastic process it suffices in many

applications to consider only certain averages, in particular the expected values of X(t) andof X2(t).

The mean or average μ(t) of X(t) is the expected value of X(t) and is defined as

μ(t) = E{X(t)} =+∞∫

−∞x f(x; t) dx (1.3)

The variance of X(t) is the expected value of the square of the difference of X(t) and μ(t)and is, by definition, equal to the square of the standard deviation:

σ2(t) = E{(X(t) − μ(t))2} = E{X2(t)} − μ2(t) (1.4)

(in case of real functions X(t) and μ(t)).

1.2 Autocorrelation and autocovariance

This distribution can be generalized to higher-order distributions, for example the secondorder distribution of the process X(t) is the joint distribution

G(x1, x2; t1, t2) = P{X(t1) ≤ x1,X(t2) ≤ x2} (1.5)

The corresponding second derivative follows from

g(x1, x2; t1, t2) ≡ ∂2G(x1, x2; t1, t2)∂x1 ∂x2

(1.6)

The autocorrelation R(t1, t2) of X(t) is the expected value of the product X(t1) · X(t2) orX(t1) ·X∗(t2) if we deal with complex quantities:

R(t1, t2) = E{X(t1) ·X∗(t2)} =+∞∫

−∞

+∞∫−∞

x1 x∗2 g(x1, x

∗2; t1, t2) dx1dx2 (1.7)

with the asterisk (∗) indicating the complex conjugate in case of complex variables (seeFig. 1.2).

The value of R(t1, t2) on the diagonal t1 = t2 = t represents the average power of thesignal at time t:

R(t) = R(t, t) = E{X2(t)} = E{ |X(t) |2 } (1.8)

5

Figure 1.2: The autocorrelation of a stochastic process.

Since the average of X(t) is in general not zero, another quantity is introduced, the autoco-variance C(t1, t2), which is centered around the averages μ(t1) and μ(t2):

C(t1, t2) = E{(X(t1) − μ(t1)) · (X(t2) − μ(t2))∗} (1.9)

For t1 = t2 = t (verify yourself)

C(t) = C(t, t) = R(t, t)− |μ(t) |2= σ2(t) (1.10)

C(t) represents the average power contained in the fluctuations of the signal around its meanvalue at time t.

1.3 Wide-sense stationary and ergodic signals

Next, consider a signal for which the average does not depend on time, and for which theautocorrelation only depends on the time difference τ ≡ t2 − t1. This is called a wide-sensestationary (wss) signal. The following relations hold:

signal average μ(t) = μ = constant (1.11)autocorrelation R(t1, t2) = R(τ) (1.12)autocovariance C(t1, t2) = C(τ) = R(τ) − μ2 (1.13)

6

For τ = 0:R(0) = μ2 + C(0) = μ2 + σ2 (1.14)

which expresses that the total power in the signal equals the power in the average signal plusthe power in its fluctuations around the average value. Note that both the autocorrelationand the autocovariance are even functions, i.e. R(−τ) = R(τ) and C(−τ) = C(τ).

Finally, a wss-stochastic process X(t) is considered ergodic when the momentaneous av-erage over X(t) can be interchanged with its time average, i.e. when

μ = limT→∞

1T

+ 12T∫

− 12T

X(t) dt = E{X(t)} (1.15)

and R(τ) = limT→∞

1T

+ 12T∫

− 12T

X∗(t) ·X(t+ τ) dt = E{X∗(t) ·X(t+ τ)} (1.16)

1.4 Power spectral density

The autocorrelation function R(τ) defined in equations 1.7 and 1.16, has a Fourier transform

S(s) =+∞∫

−∞R(τ) · e−2πisτ dτ (1.17)

Since R(τ) has the dimension of a signal power (e.g. Watt), the dimension of S(s) is [power× time], which corresponds to power per unit frequency, e.g. Watt·Hz−1. The quantityS(s) is called the power spectral density (PSD), which represents the double-sided (mean-ing: frequency −∞ → +∞) power density of a stochastic variable X(t). The Fourier pairR(τ) ⇔ S(s) is commonly referred to as the Wiener-Khinchine relation. For τ = 0 thisrelation becomes:

R(0) = E{|X(t) |2

}=

+∞∫−∞

S(s) ds (1.18)

and if the signal average μ = 0:

R(0) = C(0) = σ2 =+∞∫

−∞S(s) ds (1.19)

i.e. the power contained in the fluctuations of a signal centered around zero mean follows fromthe integration of the spectral power density over the entire frequency band.

Since physical frequencies run from zero (i.e. DC) to +∞, the one-sided power spectraldensity SOS(s) = 2S(s) is commonly used. The total signal power follows from integration ofthis entity over the physical frequency domain (see also figure 1.3):

R(0) =∞∫0

SOS(s) ds (1.20)

Figure 1.4 shows two examples of SOS(s), displaying the PSDs which relate to the charac-teristic light curves of certain compact X-ray binary systems, the so-called Low Mass X-Ray

7

Figure 1.3: Power spectra: the area under the square of the function h(t) (panel a) equalsthe area under its one-sided power spectrum at positive frequencies (panel b) and also equalsthe are under its double-sided spectrum (panel c). Note the factor 2 in amplitude between theone-sided and double-sided spectrum. Figure taken from Press et al. 1992.

Binaries (LMXBs). The binary source GX5-1 shows a “red-noise” excess, i.e. the power in-creases at low frequencies, roughly following a power law dependence proportional to s−β,on which a broad peak around a central frequency s0 is superimposed. This peak refersto so-called quasi-periodic oscillations (QPOs), indicating that the X-ray source producestemporarily coherent oscillations in luminosity. The other source Sco X-1, shows a similar be-haviour, the red-noise referring to stochastic (i.e. non-coherent) variability in X-ray luminosityis largely absent in this case.

Note: relation 1.16 defines the autocorrelation function of a stochastic process as a timeaverage. If the stochastic variable X(t) is replaced by a general function f(u), the autocorre-lation is defined as (provided that

∫∞−∞ |f(u) | du is finite):

R(x) =+∞∫

−∞f∗(u) · f(u+ x) du =

+∞∫−∞

f∗(u) · f(u− x) du (1.21)

where no averaging takes place. The Fourier transform Φ(s) of R(x) is now given by

Φ(s) =+∞∫

−∞R(x) · e−2πisx dx = F (s) · F ∗(s) =|F (s) |2 (1.22)

8

Figure 1.4: One-sided power spectra of the X-ray sources GX 5-1 and Sco X-1.

in which f(u) ⇔ F (s). This is the Wiener-Khinchin theorem for a finite function f(u), whichis only non-zero over a limited interval of the coordinate u.

For x = 0 this results in:

R(0) =+∞∫

−∞|f(u) |2 du =

+∞∫−∞

Φ(s) ds =+∞∫

−∞|F (s) |2 ds (1.23)

The relation:+∞∫

−∞|f(u) |2 du =

+∞∫−∞

|F (s) |2 ds (1.24)

9

is referred to as Parseval’s theorem, each integral represents the amount of “energy” in asystem, one integral being taken over all values of a coordinate (e.g. angular coordinate,wavelength, time), the other over all spectral components in the Fourier domain, see also figure1.3. The function Φ(s), therefore, has the dimension of a energy density, quite confusingly(and erroneously) in the literature this is also often referred to as a power density, althoughno averaging process has taken place.

1.5 Intrinsic stochastic nature of a radiation beam: Applica-

tion of Bose-Einstein statistics

Irrespective of the type of information carrier, e.g. electromagnetic or particle radiation, theincoming radiation beam in astronomical measurements is subject to fluctuations which derivefrom the incoherent nature of the emission process in the information (=radiation) source. Fora particle beam this is immediately obvious from the corpuscular character of the radiation,however for electromagnetic radiation the magnitude of the fluctuations depends on whetherthe wave character or the quantum character (i.e. photons) dominates. By employing Bose-Einstein statistics, the magnitude of these intrinsic fluctuations can be computed for thespecific case of a blackbody radiation source which is incoherent or “chaotic” with respect totime. Photons are bosons, which are not subject to the Pauli exclusion principle, consequentlymany bosons may occupy the same quantum state.

1.5.1 Intermezzo: Bose-Einstein statistics

The world is quantum-mechanical. One way of describing this, is to note that particles ina unit volume of space (’a cubic centimeter’) are distributed in momentum-space in boxeswith a size proportional to h3, where h is Planck’s constant. At each energy, there is a finitenumber Z of boxes (where Z ∝ 4πp2dp with p the particle momentum).

Consider ni particles, each with energy εi, and call the number of boxes available at thatenergy Zi. Bosons can share a box. The number of ways W (ni) in which the ni bosons canbe distributed over the Zi boxes is given by

W (ni) =(ni + Zi − 1)!ni!(Zi − 1)!

(1.25)

(To understand this: the problem is equivalent to laying ni particles and Zi − 1 boundariesin a row. The number of permutations is (ni + Zi − 1)!; then note that the particles andboundaries can be interchanged.) Similarly, we put nj bosons in Zj boxes, etc. The numberof ways in which N =

∑∞i=1 ni bosons can be distributed over the boxes in momentum space

thus is: W = Π∞i=1W (ni).

The basic assumption of statistical physics is that the probability of a distribution is pro-portional to the number of ways in which this distribution can be obtained, i.e. to W . Thephysics behind this assumption is that collisions between the particles will continuously re-distribute the particles over the boxes. The most likely particle distribution is thus the onewhich can be reached in most different ways. We find this maximum in W by determiningthe maximum of lnW , by setting its derivative to zero:

lnW =∞∑i=1

lnW (ni) ⇒ Δ lnW =∞∑i=1

∂ lnW (ni)∂ni

Δni = 0 (1.26)

10

We consider one term from the sum on the right-hand side. With Stirlings approximationlnx! x lnx− x for large x we get

lnW (ni) = (ni + Zi − 1) ln(ni + Zi − 1) − (ni + Zi − 1) − ni lnni + ni − (Zi − 1) ln(Zi − 1)+(Zi − 1) = (ni + Zi − 1) ln(ni + Zi − 1) − ni lnni − (Zi − 1) ln(Zi − 1) (1.27)

For a nearby number ni + Δni, we have

Δ lnW (ni) ≡ lnW (ni + Δni) − lnW (ni) Δni∂ lnW (ni)

∂ni= Δni [ln(ni + Zi − 1) − lnni] (1.28)

to first order in Δni.For equilibrium, i.e. the most likely particle distribution, we now have to set:

Δ lnW =∞∑i=1

Δni [ln(ni + Zi − 1) − lnni] = 0 (1.29)

Since we consider a system in thermodynamic equilibrium, i.e. for which the number ofparticles N =

∑ni per unit volume and the energy E =

∑niεi per unit volume are constant,

the variations in ni must be such as to conserve N and E i.e.

ΔN =∞∑i=1

Δni = 0 (1.30)

and

ΔE =∞∑i=1

εiΔni = 0 (1.31)

These restrictions imply that we also have:

Δ lnW − αΔN − βΔE =∞∑i=1

Δni [ln(ni + Zi − 1) − lnni − α− βεi] = 0 (1.32)

The sum in Eq. 1.32 is zero for arbitrary variations Δni, provided that for each i

ln(ni + Zi − 1) − lnni − α− βεi = 0 ⇒ niZi − 1

=1

eα+βεi − 1(1.33)

which is the Bose Einstein distribution.Since Zi 1, ni/(Zi − 1) can be replaced by ni/Zi, which represents the average occupationat energy level εi (occupation number). The values of α and β depend on the total numberof particles and the total energy, and can be determined from these by substituting ni inN =

∑∞i=1 ni and in E =

∑∞i=1 niεi.

For the Planck function, the number of photons need not be conserved: e.g. an atom canabsorb a photon and jump from orbit 1 to orbit 3, and then emit 2 photons by returning viaorbit 2. Thus, the Lagrange condition Eq. 1.30 does not apply to photons, and we obtain thePlanck function by dropping α in Eq. 1.33:

niZi − 1

=1

eβεi − 1= nνk

(1.34)

nνkis the average occupation number of photons at frequency νk. Note that photons do not

collide directly with one another, but reach equilibrium only via interaction with atoms.

11

We can make the connection to thermodynamics by taking

S ≡ k lnW ⇒ ΔS = kΔ lnW (1.35)

With Eqs. 1.32 we getΔS = kαΔN + kβΔE (1.36)

and with TΔS = −ζΔN + ΔE we find that β = 1/(kT ); here ζ ≡ −α/β is the thermody-namical potential per particle.

Fluctuations around equilibrium

To investigate the fluctuations we look at the number of ways in which ni+Δni particles canbe distributed, as compared to that for ni particles:

lnW (ni + Δni) = lnW (ni) + Δni∂ lnW (ni)

∂ni+

Δn2i

2∂2 lnW (ni)

∂n2i

(1.37)

where we now include the 2nd order term.In equilibrium the term in Eq. 1.37 proportional to Δni is zero. Consequently, we can rewriteEq. 1.37 as

W (ni + Δni) = W (ni)e−W ′′(ni)

2Δn2

i where W ′′(ni) ≡ −∂2 lnW (ni)∂n2

i

(1.38)

In other words, the probability of a deviation Δni drops exponentially with the square of Δni,i.e. the probability of Δni is given by a Gaussian! The average value for Δn2

i is found byintegrating over all values of Δni:

Δn2i =

∫∞−∞ Δn2

iW (ni)e−W ′′(ni)

2Δn2

i dΔni∫∞−∞W (ni)e−

W ′′(ni)

2Δn2

i dΔni=

1W ′′(ni)

(1.39)

(Note that W (ni) and W ′′(ni) do not depend on Δni, so that they can be considered asconstants in the integrations. Note also that the maximum negative deviation has Δni = −ni,and the maximum positive deviation Δni = N−ni; the integrals should therefore formally beevaluated between these values; however, for large Δni the integrand drops rapidly to zero,and so we can extend the integrals to the full range −∞ to +∞)without compromising theresult.

Computing the second derivative of lnW (ni) and changing sign we find for the variance:

Δn2i =

[W ′′(ni)

]−1 =ni(ni + Zi − 1)

Zi − 1= ni

[1 +

1eα+βεi − 1

](1.40)

For the Planck function we must drop the α. Thus, for photons:

Δn2i = ni

[1 +

1eβεi − 1

]= ni(1 + nνk

) (1.41)

The fluctuation in the average occupation number Δnνk2 follows from

Δnνk2 =

Δn2i

Zi= nνk

(1 + nνk) (1.42)

12

1.5.2 Intermezzo: Fermi Dirac statistics

We can repeat the same exercise for fermions. In this case the particles are not allowed toshare a box, and the number of ways W (ni) in which ni particles can be distributed over Ziboxes with energies εi is given by

W (ni) =Zi!

ni!(Zi − ni)!(1.43)

The difference in lnW (ni) between nearby numbers is

lnW (ni + Δni) − lnW (ni) = −Δni [lnni − ln(Zi − ni)] (1.44)

to first order in Δni. This leads to the equilibrium, Fermi Dirac distribution:

niZi

=1

eα+βεi + 1= nk (1.45)

and to an average value of the square of a deviation

Δn2i =

ni(Zi − ni)Zi

= ni

[1 − 1

eα+βεi + 1

]= ni(1 − nk) (1.46)

The fluctuation in the average occupation number Δnk2 follows from

Δnk2 =Δn2

i

Zi= nk(1 − nk) (1.47)

End Intermezzo

The volume density of photons in a blackbody Bose gas with frequencies between νk andνk + dνk follows from N(νk)dνk = g(νk)nνk

dνk, in which g(νk) represents the volume densityof quantum states per unit frequency at νk. Since the stochastic variables nνk

are indepen-dent, the Bose-fluctuations ΔN2(νk) in photon density per unit frequency can be written as(omitting the suffix k):

ΔN2(ν) = N(ν)(

1 +1

exp(hν/kT ) − 1

)(1.48)

where N(ν) follows from the specific energy density ρ(ν) = ρ(ν)equilibrium given by (see OAF1,Chapter 2):

ρ(ν)dν =8πhc3

ν3

exp ( hνkT ) − 1dν (1.49)

through the relation N(ν) = ρ(ν)/hν.If a detection element, e.g. an antenna, is placed within a blackbody radiation field, for

example inside a vacuum enclosure at temperature T , the incident photon flux is given byn(ν) = 1

2c

4π N(ν)AeΩ. The factor 12 refers to one component of polarisation, Ae is the effective

area of the detection element and Ω constitutes the solid angle subtended by the detector beamviewing the radiation field. If radiation illuminates an extended surface (Ae) with variousdirections of the wave vector, i.e. an omnidirectional blackbody radiation field, coherencetheory states that spatial coherence is limited to AeΩ ≈ λ2, the so-called extent (etendue) of

13

coherence.1 Substituting N(ν), the expression for the specific photon flux n(ν) (in photonss−1 Hz−1) becomes:

n(ν) =1

exp(hν/kT ) − 1(1.50)

Δn2(ν) = nν

(1 +

1exp(hν/kT ) − 1

)(1.51)

In the extreme case that hν kT , the second term becomes much smaller than 1, hence:

Δn2(ν) = n(ν) (1.52)

This is the well-known expression for Poissonian noise in a sample containing n(ν) photons.This condition is called the quantum limit of the fluctuations and it represents the mini-mum value of intrinsic noise present in any radiation beam. Obviously, this always holds forcorpuscular radiation (cosmic-rays) and neutrinos, since the wave character is not an issue.

For a photon energy hν � kT , the noise is normally expressed in terms of the aver-age radiation power P (ν) (e.g. in Watt Hz−1) by writing P (ν) = (hν)n(ν) and ΔP 2(ν) =(hν)2Δn2(ν):

ΔP 2(ν) = P (ν)(hν +

hν

exp(hν/kT ) − 1

)= P (ν)(hν + P (ν)) (1.53)

Taking the limit hν � kT :

ΔP 2(ν) = P 2(ν) (1.54)and P (ν) = kT (1.55)

which is the expression for the classical thermal noise power per unit frequency bandwidth.This limit is called the thermal limit.

The transition between noise in the quantum limit to the thermal limit occurs at hν ≈ kT .At room temperature, T ≈ 300 K, this corresponds to a frequency ν ≈ 6 THz, or a wavelengthλ ≈ 50 μm. The relation ν = kT/h as a function of temperature T is displayed in figure1.5. It is clear from this diagram that radio observations are always dominated by the wavecharacter of the incoming beam and are therefore performed in the thermal limit. As a result,the treatment of noise in radio observations differs drastically from that of measurements atshorter wavelengths. Specifically at submillimetric and infrared wavelengths quantum limitedobservation is vigorously pursued but this remains still difficult.

The fluctuations in average power P (ν), given in equation 1.54, for the thermal limit canbe interpreted in such a way that whenever wave packet interference becomes important, theinterference will cause the fluctuations to become of the same magnitude as the signal. Thelow frequency fluctuations can be thought of as caused by the random phase differences andbeats of the wavefields, as described in the following paragraph.

Note of Caution: The above expression for the fluctuations in a blackbody photon gasapplies only to the interior of a blackbody in which the receiving element is submerged, i.e.a blackbody cavity or a thermal bath, where the condition λ2 = c2/ν2 = AeΩ is satisfied. Ifthis condition is not fulfilled, then even in the limit hν � kT quantum noise may dominate.For example, if a star, whose spectrum resembles a blackbody at temperature T , is observed atfrequency ν, such that hν � kT , thermal noise ought to dominate. The star may however beso distant that the radiation is effectively unidirectional and hence, AeΩ � λ2. The photonswill consequently arrive well separated in time and quantum noise evidently dominates.

1This relation is the same as that governing the size θ = λ/D of a diffraction limited beam (Ω ≈ θ2) for anaperture with diameter D: Ae ≈ D2.

14

Figure 1.5: Frequency as a function of temperature: division between thermal and quantumnoise. Figure taken from Lena (1988).

1.6 Stochastic description of a radiation field in the thermallimit

In astrophysics, many sources of electromagnetic (EM) radiation have a thermal origin. Thisimplies that the incoming wave-signal has been built up by the superposition of many indi-vidual wavepackets, which were all generated by a myriad of independent radiative atomictransitions at the radiation source. Each wavepacket has a limited duration that dependson the characteristic time scale of the particular atomic transition. The intrinsic spread inthe frequency of the emitted wavepacket derives from the uncertainty relation of HeisenbergΔεΔt = h with spread in frequency Δν = 1/Δt. The characteristic time length of such awavepacket Δt ≡ τc = 1/Δν is called the coherence time, it represents the typical time scaleover which the phase of the EM-wave can be predicted with reasonable accuracy at a givenlocation in space. Figure 1.6 shows the stochastic signal comprising a random superpositionof individual wavepackets. This wave signal fluctuates both in amplitude and in frequency,the latter characterised by a typical bandwidth Δν around an average frequency ν. Such asignal represents a quasi-monochromatic wave with a frequency stability ν/Δν.

A quasi-monochromatic radiation field from a thermal source can be quantitatively de-scribed by a complex expression for the electric field E(t), comprising a harmonic oscillationat an average frequency ν modulated by a slowly varying envelope E0(t):

E(t) = E0(t) · ei(2πνt) (1.56)

The complex amplitude E0(t) is also known as a phasor and is specified by a time dependentmagnitude | E0(t) | and phase φ(t). The physical basis and the properties of this signal willnow be elaborated in some detail in order to understand the process of detection.

In the case of an idealised monochromatic plane wave, Δν reduces to a delta function:δ(ν − ν). In the time domain this is represented as an infinitely long wave train (i.e. the

15

Figure 1.6: A quasi-monochromatic wave.

Figure 1.7: Random fluctuations of �E(t)

Fourier transform of a δ-function). If this wave train were to be resolved in two orthogonalpolarisation components perpendicular to the direction of propagation, they in turn musthave the same frequency, be infinite in extent and are therefore mutually coherent. In otherwords a perfectly monochromatic plane wave is always polarised. Putting this in the contextof the phasor E0(t) of a linearly polarised plane wave, this results in:

E0(t) =|E0(t) | eiφ(t) =|E0 | eiφ0 (1.57)

i.e. both the amplitude |E0 | and phase φ0 of the phasor are constant in time.Turning now to a thermal radiation source, the emission consists of an extremely large

number of randomly oriented atomic emitters. Each atom radiates a polarised wave train forroughly 10−8 or 10−9 seconds in the case of optical light depending on the natural line widthΔν of the transition. In the case of molecular vibrational or rotational transitions (radioand far infrared) the timescales are longer. Considering a certain wave propagation direction

16

�k, individual atomic (molecular) emissions at the same frequency along that direction willcombine to form a single resultant polarised wave, which however will not persist for more thanthe typical coherence time τc of a wave packet, i.e. in the optical 10−8−10−9 seconds. New wavetrains are continually emitted and as a result the magnitude and the polarisation directionof the electric vector �E(t) changes in a completely random manner on a typical time scaleequal to the coherence time τc. If these changes occur at a rate of 108 to 109 per second, anysingle resultant polarisation state is undiscernable. Thermal radiation is therefore designatedas natural or unpolarised light, although the latter qualification is somewhat confusing sincein actuality the light comprises a rapid succession of different polarisation states.

The random fluctuations of �E(t) can be quantitatively described in a scalar approach, byassessing the characteristics of the fluctuation in the phasor E0(t), i.e. of its magnitude |E0(t) |and its phase φ(t). For time scales short compared to the coherence time (Δν)−1, E0(t) willremain almost constant in time, for optical light with τc ≈ 10−8 s this still comprises severalmillion harmonic oscillations of the electric vector �E(t) (ν ≈ few 1014 Hz). For time scalesτ τc, |E0(t) | and φ(t) will vary randomly, this is pictorially shown in figure 1.7.

Mathematically these fluctuations can de described by regarding the real and imaginaryparts of E0(t), Re(E0(t)) and Im(E0(t)), as uncorrelated Gaussian stochastic variables withequal standard deviation, e.g. depict them as linearly polarised waves for which the relativephase difference varies rapidly and randomly (mutually incoherent). The joint (bivariate)probability density distribution is then given by:

p(ReE0(t), ImE0(t)

)dReE0(t) dImE0(t) =

12πσ2

e−Re2E0(t)+Im2E0(t)

2σ2 dReE0(t) dImE0(t)

(1.58)Furthermore, the following relations hold:

|E0(t) |2 = |E(t) |2= Re2E0(t) + Im2E0(t) (1.59)

φ(t) = arg(E0(t)) = arg(E(t)) − 2πνt = arctanImE0(t)ReE0(t)

(1.60)

Changing to polar coordinates the bivariate probability density distribution for | E0(t) | andφ(t) can be obtained:

p(|E0(t) |, φ(t)

)d |E0(t) | dφ(t) =

|E0(t) |2πσ2

e−|E0(t)|2

2σ2 d |E0(t) | dφ(t) (1.61)

Integration over |E0(t) | yields:

p (φ(t)) =12π

(1.62)

i.e. all phase angles φ(t) are equally probable for unpolarised radiation, which is obviously tobe expected.

Integration over all phase angles φ(t) yields the amplitude distribution | E0(t) | for anunpolarised thermal radiation beam:

p(|E0(t) |

)=

|E0(t) |σ2

e−|E0(t)|2

2σ2 (1.63)

This distribution for the magnitude of the electric vector | E(t) | over all polarisation anglesis the so-called Rayleigh distribution. p

(|E0(t) |

)and p (φ(t)) are shown in figure 1.8.

17

Figure 1.8: Distribution functions of p(|E0(t) |

)and p (φ(t)).

Problem: 1: Show that the most probable value of | E0(t) | equals σ and the averagevalue for the amplitude of the unpolarised beam equals σ

√π2 . �

From the distribution for | E0(t) |, the probability density of the instantaneous intensity(or irradiance) I(t) for thermal radiation can be readily derived.

Intermezzo: The power flux density transported by an EM-wave

The energy streaming through space in the form of an electromagnetic wave is shared be-tween the constituent electric and magnetic fields.The energy density of an electrostatic field (e.g. between plates of a capacitor) ρ E = εrε0| �E|2/2(dimension Joule/m3), with | �E| the magnitude of the electric vector (dimension V/m) andε0 the vacuum permittivity (8.8543 · 10−12 Asec/Vm). Similarly, the energy density of amagnetic field (e.g. within a toroid) equals ρ B = | �B|2/(2μrμ0) (dimension Joule/m3), with| �B| the magnitude of the magnetic vector (dimension Tesla = Vsec/m2) and μ0 the vacuumpermeability (4π · 10−7 Vsec/Am).The wave equation for a plane electromagnetic wave traveling along the x-direction invacuum is given by:

∂2E(x, t)∂x2

=1c2∂2E(x, t)∂t2

and∂2B(x, t)∂x2

=1c2∂2B(x, t)∂t2

(1.64)

for the electric field wave and the magnetic field wave respectively. The magnetic field wavetravels in a plane perpendicular to the electric field, both the electric field and the magneticfield directions are perpendicular to the direction of propagation (x). The plane wave solutioncan be expressed by a harmonic function, using a complex scalar representation:

E(x, t) = E0ei·2π(νt−x/λ) and B(x, t) = B0e

i·2π(νt−x/λ) (1.65)

Consistency with Maxwell’s equations requires that for the EM-wave holds ρ E = ρ B. Hence,from the above, we have B0 = E0/c.The flow of electromagnetic energy through space associated with the traveling EM-wave isrepresented by the Poynting vector �S = (1/μ0)�E x �B, a vector product that symbolizes thedirection and magnitude of the energy transport per unit time across a unit area (e.g. inunits Watt m−2). The vector magnitude |�S| = |E||B|(sin φ)/μ0 equals |E||B|/μ0, since themagnetic field is perpendicular to the electric field (φ = π/2). Representing the actual wavesignal by taking the real part of expressions (1.65) we get:

|�S| = E0B0 cos2 2π(νt− x/λ) = ε0cE20 cos2 2π(νt− x/λ) = (ε0/μ0)

12E2

0 cos2 2π(νt− x/λ)(1.66)

18

The average power flux density for an ideal monochromatic plane wave, I(t) equals |�S(t)|:

I(t) = (ε0/μ0)12E2

0cos2 2π(νt− x/λ) = (ε0/μ0)12E2

0

2= 2.6544 · 10−3E

20

2(1.67)

expressed in Watt/m2 for E0 in Volts/meter.An idealised monochromatic plane wave is represented in the time domain by an infinitely longwave train and is by definition fully polarised. As has already been discussed, an unpolarised,quasi-monochromatic, radiation field from a thermal source can be described by a complexexpression for the electric field E(t), comprising a harmonic oscillation at an average frequencyν modulated by a slowly varying envelope, accomodated by the phasor E0(t), i.e. E(t) =E0(t) ·ei(2πνt). The average power flux density for this wave then follows from the expectationvalue of the product E(t)E∗(t):

I(t) = (ε0/μ0)12E

{E(t)E∗(t)

}= 2.6544 · 10−3 E

{|E0(t)|2

}(1.68)

Since we are primarily concerned with relative power flux densities generated by these travel-ing waves within the same medium, we can disregard in what follows multiplication with thenumerical constant in expression (1.68), since this (deterministic) quantity is only of relevancefor assessing the absolute numerical value of the power flux density and bears no influenceon the description of the stochastic nature of the signals. In practical computations, thisconstant should of course be applied!

End Intermezzo

We can now set the following equalities:

I(t) = E(t) · E∗(t) = |E(t) |2 = |E0(t) |2 (1.69)

Transformation of variables in equation 1.63 yields:

p (I) dI = (I)−1 e−I/I dI (1.70)

with I = E{|E0(t)|2

}= 2σ2.

This is an exponential probability density distribution.Problem 2: Prove that its variance is given by:

ΔI2 = I2 (1.71)�

This result, which is now obtained formally from a bivariate Gaussian distributed stochas-tic process with zero-mean for the harmonic wave components, is the same as the fluctuationin the average monochromatic radiation power (Watt Hz−1) of a blackbody radiation field wederived earlier ( 1.54), i.e. ΔP 2(ν) = P 2(ν), which should of course hold.In summary, a proper stochastic description is now available for an unpolarised thermal radi-ation field, through a scalar treatment using the complex expression for the electric field:

E(t) = E0(t) ei(2πνt) =|E0(t) | eiφ(t) ei(2πνt) =|E0(t) | ei(2πνt+φ(t)) (1.72)

with all values of φ(t) equally probable, the magnitude of the complex amplitude | E0(t) |distributed according to a Rayleigh distribution and the instantaneous frequency representedby

ν =12π

d

dt(2πνt+ φ(t)) (1.73)

19

Figure 1.9: Staircase function.

The bandwidth Δν follows from ν − ν = ddtφ(t).

In practice a radiation beam is generally neither completely polarised nor completelyunpolarised, both cases are extremes. More often the electric vector �E(t) varies in way thatis neither totally regular nor totally irregular, the radiation should be regarded as partiallypolarised. An useful way to describe this behaviour is to envision it as the result of thesuperposition of specific amounts of natural and polarised light. If a quantitative assessmentof the degree of polarisation is required, a measurement of all four Stokes parameters isrequired. In radio astronomy this is often done due to the intrinsic sensitivity of the receiverfront-end for a particular direction of polarisation.

1.7 Stochastic description of a radiation field in the quantumlimit

In the quantum limit no coherence effects occur and the radiation field fluctuations can bedescribed by photon statistics only. Consider an incident radiation beam (wide-sense sta-tionary, ergodic) with an average flux of nb photons (or particles or neutrinos) per second.The generation of photons at random times ti can be described by a stochastic variable X(t),which consists of a family of increasing staircase functions, with discontinuities at the thepoints ti (see figure 1.9):

X(t) =∑i

U(t− ti) (1.74)

with U(t) the unit-step function:

U(t) =

{1 for t ≥ 00 for t < 0

(1.75)

The derivative of the stochastic variable X(t):

Y (t) =dX(t)dt

=∑i

δ(t− ti) (1.76)

represents a train of Dirac impulses at random time positions ti. The number of photons XΔT

registered during a measurement period ΔT (which is a (small) part of the total measurementtime T ), follows from:

XΔT =t+ΔT∫t

∑i

δ(t − ti) dt = k (1.77)

20

The random variable XΔT is distributed according to a Poisson distribution. The probabilityto detect k photons, if the mean value is μ(= nbΔT ), follows from:

pP (k, μ) =μk

k!e−μ (1.78)

with k an integer value. The (continuous) probability density function for Poissonian statisticscan now be expressed as:

p(x, μ) =∞∑k=0

pP (k, μ) δ(x − k) (1.79)

So, the average value of XΔT is

E{XΔT } =+∞∫

−∞x p(x, μ) dx = μ(= nb ΔT ) (1.80)

If the expected value E{XT } = μ for the average number of photons intime period T , the probability p that a photon arrives in a subinterval ofT can be equated from p = μ/m if m equals the number of subintervalswithin T . The probability that no photon arrives is 1−p. The measure-ment can thus be considered as a series of m trials to find a photon, eachhaving a probability of p of succeeding. The probability that in total kphotons will be detected is therefore given by the binomial probabilityfunction (k < m):

pB(k,m, p) =

(mk

)pk (1 − p)m−k (1.81)

However, if the subinterval is large, there is a finite probability that morethan one photon will arrive within this interval, hence the limit shouldbe taken for the number of trials m to go to infinity while maintainingmp = μ constant. In this limit m→ ∞, p→ 0, the binomial distributionchanges to the Poisson distribution:

pP (k, μ) =μk

k!e−μ (1.82)

The exponential factor ensures that this distribution function is normal-ized: ∞∑

k=0

pP (k, μ) = 1 (1.83)

Problem 3: Derive this distribution transformation and show that thePoisson distribution is normalized. �

The autocorrelation RXΔT(τ) is given by

RXΔT(τ) = E{XΔT (t+ τ) ·XΔT (t)}

= μ2 + μ δ(τ)= (nb ΔT )2 + (nb ΔT )δ(τ) (1.84)

21

(Note that RXΔT(0) = μ2 + μ.) The first term on the right hand side of equation (1.84) is

the square of the average, the second term represents the covariance which equals here thevariance since it is 0 everywhere except for τ = 0. This is obvious since the photon arrivaltimes ti are totally uncorrelated.

Problem 4: Derive E{XΔT } = μ and RXΔT(0) = μ2 +μ by using the expression for pP (k, μ)

and repetitive differentiation of the Taylor expansion of the exponential eμ. �

From these results a signal-to-noise ratio SNR can be defined which represents the intrinsiclimitation to the accuracy of the measurement due to the signal-photon noise:

SNR =E{XΔT }√CXΔT

(0)=√nb ΔT (1.85)

This shows that the intrinsic SNR of the radiation field measurement increases with the squareroot of the mean photon flux nb and with the square root of the measurement interval ΔT .In terms of the stochastic variable Y (t) we obtain in an analogous fashion:

E{Y (t)} = nb (1.86)and RY (τ) = n2

b + nb δ(τ) (1.87)

22

Chapter 2

Astronomical measuring process:information transfer

2.1 Integral response function for astronomical measurements

In observational astrophysics, the system response to an incoming radiation field can be char-acterised by a filtering process, arising from the individual elements making up the telescopeconfiguration, of a stochastic process described by the monochromatic intensity I(ν, �Ω, t).Generally, the time dependent output of the telescope is described by

X(t) = S(t) +N(t) (2.1)

in which S(t) represents the outcome of the filtering of the signal source and N(t) representsthe sum of all (filtered) noise components like background radiation, disturbances arising fromthe operational environment and intrinsic noise in the detection system, like dark current.The measuring process of the source signal S(t) can be symbolically written as a series ofconsecutive convolutions, defined by the angular and the spectral response functions of theobservational instrument:

S(t) =∫

Δν

⎡⎢⎣R(ν) ∗

∫ΔΩF OV

[I(�Ω, ν, t) ∗ P (�Ω, ν)

]d�Ω

⎤⎥⎦ dν (2.2)

P (�Ω, ν) represents the collecting power of the telescope. which depends in general on thefrequency (or wavelength, or energy). It is a function of the telescope off-axis angle in the fieldof view �ΩFOV and contains the point spread function H(�Ω, ν), which quantitatively describesthe angular resolution (field position dependent). Δ�ΩFOV gives the solid angle over which theconvolution I(�Ω, ν, t) ∗P (�Ω, ν) is to be integrated. The choice of Δ�ΩFOV entirely depends onthe number of image elements in the field of view, which may be as large as 108 in modernsystems, and on the objective of the particular observation (e.g. ultimate angular resolutionor high quality spectrum over a relatively large part of the field of view). Accordingly theintegral can be done over the whole field of view �ΩFOV , which may cover a large part of thesky in the case of wide-field cameras, or it may be done over just one image pixel.

The integration of the signal after the second convolution with R(ν) covers the spectralrange of interest Δν, which is part of the total bandwidth ν. Again, this may vary from a verynarrow range (e.g. measuring the line profile of a single spectral line) to a very broad range(in case of photometry). The number of frequency elements can therefore range from 1 (e.g.in the case of a bolometric detector) to approximately 106 in a high-resolution spectrograph.

23

It is important to remember continually that the term frequency covers here three typesof Fourier pairs.

• The pair I(�Ω) ⇔ I(�ζ) refers to spatial resolution, the frequency �ζ in the Fourier domainhas to be interpreted as spatial frequency. This refers to structures in the image contrast.

• The pair I(ν) ⇔ I(s) refers to spectral resolution, in this case s is a Fourier frequencywhich relates to a spectral frequency. A spectrum containing a large number of sharp fea-tures, like narrow emission and absorption lines, possesses much power in high spectralfrequencies; a featureless continuum contains only low spectral frequencies.

• The pair I(t) ⇔ I(f) refers to time resolution, the frequency f relates to temporalfrequency.

Every measurement or observation implies bandwidth limitations on each of these frequencies.The normalised value of the Fourier transform of a particular instrument response func-

tion, e.g. R(s) or H(�ζ), is called the Modulation Transfer Function (MTF) and describes thefrequency dependent filtering of the source signal in the Fourier domain. The MTF referseither to the amplitude/phase transfer function of the signal or to the power transfer func-tion, in practice this will be explicitly clear from the specific context in which the MTF isemployed.

2.2 Time filtering

2.2.1 Finite exposure and time resolution

In practice, the measurement or registration of a stochastic process always takes place over afinite period T and with a certain resolution ΔT , i.e. the minimum time bin for a data point.The limitation in measuring time T corresponds to a multiplication in the time domain of astochastic variable X(t) with a window (block) function Π(t/T ). This function is describedas follows,

Π(t

T

)≡ 1 for |t| ≤ 1

2T (2.3)

Π(t

T

)≡ 0 for |t| > 1

2T (2.4)

(2.5)

Consequently a new, time filtered, stochastic variable Y (t) is introduced:

Y (t) = Π(t

T

)X(t) (2.6)

The limitation in time resolution always arises in practice due to the frequency-limited trans-mission characteristic of any physical measuring device.

Suppose, as an example, the measurement is taken at time t within the measuring period Twith a temporal resolution ΔT . This corresponds to an integration of the stochastic variableY (t) between t−ΔT/2 and t+ΔT/2, divided by ΔT (a so called running average). It followsthat

Z(t) ≡ YΔT (t) =1

ΔT

t+ΔT/2∫t−ΔT/2

Y (t′)dt′ =1

ΔT

+∞∫−∞

Π(t− t′

ΔT

)Y (t′)dt′ (2.7)

24

Figure 2.1: Illustration of the errors in av-erage and power spectrum of a stochasticprocess x(t). a) a realization of the pro-cess b) filtered with a low-pass filter (highfrequencies are removed): y(t) c) measure-ment during finite time interval T , and theaverage during this interval (dashed line,value yT ) d) the autocorrelation functionsof the process Rx(τ) and of the measur-ment Ry(τ).

This equation can also be expressed in terms of a convolution in the time domain:

Z(t) =1

ΔTΠ(

t

ΔT

)∗ Y (t) =

1ΔT

Π(

t

ΔT

)∗ Π

(t

T

)X(t) (2.8)

which represents a low-frequency (or ‘low-pass’) filtering of the stochastic variable Y (t). Thevalues μT and RT (τ) for an ergodic process obtained from a finite measuring period T willtherefore slightly differ from the true values μ and R(τ). The error introduced by measuringthe sample average μT rather than the true average μ is the subject of the next paragraph.

2.2.2 Error assessment in sample average µT

We wish to determine the accuracy with which the approximate value μT approaches thereal value μ. An illustration of this is given in Figure 2.1. To do so we start by noting thatdetermining the average corresponds to convolution in the time domain with a block function.We denote this

X(t) → 1T Π

( tT

) → XT (2.9)

In terms of the Fourier transforms, the averaging corresponds to a multiplication with a sinc-function. In general, we write the effect of the measuring apparatus on the signal in theFourier domain as Y (f) = X(f)H(f) and hence Y ∗(f) = X∗(f)H∗(f), and with H(f) thetransfer function. We can thus also write |Y (f)|2 = |X(f)|2|H(f)|2.

Note: in the literature the term transfer function is used both for H(f), i.e. the signaltransfer function, but also for |H(f)|2, i.e. the power transfer function. So be prepared for thecorrect interpretation whenever you encounter the term transfer function in the literature!

25

With this we take the Fourier transform of the autocorrelation, to find:

SXT(f) = |H(f)|2SX(t)(f) = sinc2(Tf) · SX(t)(f) (2.10)

Transforming back to the time domain, we write this as

RXT(τ) = h(τ) ∗ h(τ) ∗RX(t) (2.11)

and note that h ≡ (1/T )Π(t/T ) is a real function. We note that the convolution of a blockwith itself is a triangle, and introduce the notation:

h(τ) ∗ h(τ) ≡ ρ(τ) ≡ 1T

Λ(τ

T

)

to rewrite Eq. 2.11 as

RXT(τ) =

1T

Λ(τ

T

)∗RX(t) ≡

1T

+∞∫−∞

Λ(τ ′

T

)RX(t)(τ − τ ′)dτ ′ (2.12)

For convenience we consider the case where μ = 0, i.e. R = C, and find

CXT(τ) =

1T

+∞∫−∞

Λ(τ ′

T

)CX(t)(τ − τ ′)dτ ′ (2.13)

To compute the variance we set τ = 0 and find

CXT(0) ≡ [σXT

]2 =1T

+∞∫−∞

Λ(τ ′

T

)CX(t)(−τ ′)dτ ′ =

1T

+∞∫−∞

Λ(τ ′

T

)CX(t)(τ

′)dτ ′ (2.14)

where we have used the fact that C is even. Explicitly writing Λ we finally obtain

[σXT]2 =

1T

+T∫−T

(1 − |τ ′|

T

)CX(t)(τ

′)dτ ′ (2.15)

Two things are to be noted in this equation. First: the integral ranges from −T to +T , i.e.over a range with length 2T , but nonetheless the normalization factor is 1/T . Second, theautocovariance is always limited in the frequency domain.

As an example we consider a specific form for the transfer function, viz.:

H(f) =1

1 + 2πifτo(2.16)

At small frequencies f � 1/(2πτo) ≡ fo the transfer is complete, i.e. |H(f)| = 1, and athigh frequencies f fo the transfer is inversely proportional to the temporal frequency, i.e.|H(f)| = fo/f . The frequency fo is the ‘cut-off’ frequency of the transfer function H(f).Figure 2.2 shows this transfer function, which is called a first order system.

We will forego the full mathematics here, and merely conclude that the autocovariancewith such a system drops exponentially with (the absolute value of) the time difference τ :

CX(t)(τ) = CX(t)(0)e−|τ |/τo where τo ≡ 1

2πfo(2.17)

26

Figure 2.2: For a first order transfer function (left), the autocovariance drops exponentiallywith |τ | (right).

This means that at times τ τo the correlation is virtually zero. By entering Eq. 2.17 intoEq. 2.15 and performing the integration, we get

[σXT]2 = 2

[σX(t)

]2 τoT

[1 − τo

T

(1 − e−T/τo

)](2.18)

To get a feeling for the meaning of this equation we consider two limiting cases. First theone in which the duration of the measurement largely exceeds the correlation time, T τo.Eq. 2.18 then becomes

[σXT]2 = 2

[σX(t)

]2 τoT

=

[σX(t)

]2πfoT

(2.19)

and we see that the variance of the measured signal is proportional to the variance of theincoming signal, and approximates zero when the duration of the measurement goes to infinity,and also when the number of frequencies over which one measures goes to infinity. Themeasured signal is then said to be ergodic in the mean. The last limit can be understood bynoting that foT is the number of cycles during T with a frequency fo, i.e. it gives the numberof measurements; Eq. 2.19 thus is analogous to the equation which gives the variance of theaverage σ2

μ = σ2/N .The other limit we consider is the one for which the duration of the measurement equals

the correlation time, T = τo. Eq. 2.18 in this limit becomes

σXT

2 = 2σX(t)2e−1 σX(t)

2 (2.20)

This is also understandable in terms of determining the average, in the case where just onemeasurement is taken: N = 1.

The moral we can draw from this example is that one must take good care that theduration of the measurement is much longer than the correlation time, T τo, if one wishesto avoid large errors in the estimates of the average and of the variance. Another moral isthat we must take into account the errors in the average and in the variance whenever we arelooking for really small effects.

27

Figure 2.3: Illustration of samplingoptimally, compared with undersam-pling and oversampling. Left in thetime domain, right in the Fourierdomain.

2.3 Data sampling in a bandlimited measuring system: Criti-

cal or Nyquist frequency

Let us consider the general case of a signal S(x) which has been subject to the instrumentresponse R(x), so that the resulting measurement M(x) follows from

M(x) = S(x) ∗R(x) (2.21)

Because of the finite frequency response of the instrument, M(x) is always limited inbandwidth, i.e. the Fourier transform M(s) ⇔ M(x) is a bandwidth-limited function. Thisfunction is characterised by a maximum cut-off frequency smax, also called the critical orNyquist frequency (sc). In the case of a gaussian response the frequencies will never bedistributed purely gaussian, since no physical system can transmit the tail frequencies upto ∞. Shannon (and Nyquist) established a theorem for optimum sampling of band limitedobservations. This theorem states that no information is lost if sampling occurs at intervalsτ = 1/(2sc).

Let M(x) subsequently be sampled at regular intervals, M(x) →M(nτ) with n an integerand τ the sampling interval. To describe the sampling process quantitatively we introducethe shah function, also called the comb of Dirac, which constitutes a series of δ functions atregular distances equal to 1:

⊥⊥⊥(x) ≡∞∑

n=−∞δ(x− n) (2.22)

This shah function can be extended to arbitrary distances by noting a⊥⊥⊥(ax) =∑n δ(x −

n/a).The sampled signal Ms(x) can now be expressed as

Ms(x) =∑n

M(nτ)δ(x − nτ) =1τ⊥⊥⊥

(x

τ

)M(x) (2.23)

28

The Fourier transform Ms(s) ⇔Ms(x) equals

Ms(s) = ⊥⊥⊥(τs) ∗M(s) =1τ

∑n

M

(s− n

τ

)(2.24)

This expression shows that, except for a proportionality factor 1/τ , Ms(s) represents a seriesof replications of M(s) at intervals 1/τ . Because M(s) is a bandwidth-limited function with acut-off frequency of say s = sc, we can recover fully the single (i.e. not repeated) function M(s)from this series by multiplication with τ and by filtering with the gate function Π(s/2sc):

Π(s

2sc

)τ⊥⊥⊥(τs) ∗M(s) ⇔ 2scsinc2scx ∗ ⊥⊥⊥

(x

τ

)M(x) (2.25)

M(x) can be reconstructed exactly if the series of M(s) functions in the frequency domaintouch without overlap. This is the case if we sample at τ = 1/(2sc), which therefore is theoptimum sample interval. Performing the convolution we fully reconstruct M(x), i.e.:

M(x) =+∞∫

−∞sinc

(x− x′

τ

)∑n

M(nτ)δ(x′ − nτ)dx′ =∑n

sinc(x− nτ

τ

)M(nτ) (2.26)

We can check this result easily at a sampling point x = jτ , with sinc(j−n) = 1 for j = n and= 0 for j �= n:

M(x) = M(jτ) (2.27)

Note: Eq. 2.25 shows that the calculation of intermediate points from samples does not ofcourse depend on calculating Fourier transforms. The equivalent operation in the x-domainentails the convolution of 2scsinc2scx directly with ⊥⊥⊥(x/τ)M(x). Notice that the omissionof the 1/τ factor in Eq. 2.23 ensures the proper normalization in the s-domain!

Expression 2.26 shows that a superposition of a series of sinc-functions with weight factorsM(nτ), i.e. the sample values, at intervals τ exactly reconstruct the continuous functionM(x).In fact the sinc-functions provide the proper interpolation between the consecutive samplepoints, this is the reason why the sinc-function is sometimes referred to as the interpolationfunction.

Thus, the use of a discrete Fourier transform causes no loss of information, provided thatthe sampling frequency 1

τ is twice the highest frequency in the continuous input function (i.e.the source function convolved with the response function). The maximum frequency smaxthat can be determined for a given sampling interval equals therefore 1

2τ . If the input signalis sampled too slowly, i.e. if the signal contains frequencies higher than 1

2τ , then these cannotbe determined after the sampling process and the finer details will be lost (see Fig. 2.3).More seriously however, the higher frequencies which are not resolved will beat with themeasured frequencies and produce spurious components in the frequency domain below theNyquist frequency. This effect is known as aliasing and may give rise to major problems anduncertainties in the determination of the source function, see figure 2.4 in the case of a timefunction.

29

Figure 2.4: The function h(t) shown in the top panel is undersampled. This means that thesampling interval Δ is larger than 1

2fmax. The lower panel shows that in this case the power in

the frequencies above 12Δ is ’mirrored’ with respect to this frequency and produces an aliased

transform which deviates from the true Fourier transform. Figure taken from Press et al.1992.

30

Chapter 3

Indirect Imaging and Spectroscopy

3.1 Coherence

3.1.1 The Visibility function

The coherence phenomenon is directly coupled to correlation, and the degree of coherence ofan EM-wave field E(�r, t) can be quantitativily described by employing the auto- and cross-correlation technique for the analysis of a stochastic process.The electric vector of the wave field at a position �r at time t, E(�r, t), is a complex quantity,denoting the amplitude and phase of the field. To assess the coherence phenomenon, thequestion to be answered is: how do the nature of the source and the geometrical configurationof the situation relate to the resulting phase correlation between two laterally spaced pointsin the wave field?This brings to mind Young’s interference experiment in which a primary monochromaticsource S illuminates two pinholes in an opaque screen, see figure 3.1. The pinholes S1 and S2

act as secondary sources, generating a fringe pattern on a distant observation plane ΣO. IfS is an idealized monochromatic point source, the wavelets issuing from any set of aperturesS1 and S2 will maintain a constant relative phase; they are precisely correlated and thereforemutually fully coherent. On the observation plane ΣO a well-defined array of stable fringeswill result and the radiation field is spatially coherent. At the other extreme, if the pinholes S1

and S2 are illuminated by separate thermal sources (even with narrow frequency bandwidths),no correlation exists; no fringes will be observable in the observation plane ΣO and the fieldsat S1 and S2 are said to be incoherent. The generation of interference fringes is seemingly aconvenient measure of the degree of coherence of a radiation field. The quality of the fringesproduced by an interferometric system can be described quantitativily using the Visibilityfunction V:

V =Imax − IminImax + Imin

(3.1)

here Imax and Imin are the irradiances corresponding to the maximum and adjacent minimumin the fringe system.

3.1.2 Young’s dual beam interference experiment

To assess the mutual coherence between two positions in a radiation field in a quantitativefashion, consider the situation displayed in figure 3.1, with an extended narrow bandwidthradiation source S, which generates fields E(�r1, t) ≡ E1(t) at S1 and E(�r2, t) ≡ E2(t) at S2,

31

Figure 3.1: Young’s experiment using a quasi-monochromatic source S illuminating two pin-holes S1 and S2.

respectively. Interalia: no polarization effects are considered, and therefore a scalar treatmentusing E(�r, t) will suffice.If these two positions in the radiation field are isolated using an opaque screen with two small

apertures, we are back to Young’s experimental set-up. The two apertures serve as sources ofsecondary wavelets, which propagate out to some point P on the observation plane ΣO. Theresultant field at P is:

EP (t) = C1E1(t− t1) + C2E2(t− t2) (3.2)

with t1 = r1/c and t2 = r2/c, r1 and r2 representing the pathlengths to P as measured fromS1 and S2, respectively. This expression tells us that the field at the space-time point (P, t)can be determined from the field that existed at S1 and S2 at (t− t1) and (t− t2), respectively,these being the the instants when the light, which is now overlapping at P , first emergedfrom the apertures. The quantities C1 and C2 are so-called propagators, they mathematicallyreflect the alterations in the field resulting from it having transversed either of the apertures.For example, the secondary wavelets issuing from the pinholes in the Young set-up are out ofphase by π/2 radians with the primary wave incident on the aperture screen. In that case,C1 and C2 are purely imaginary numbers equal to eiπ/2.

3.1.3 The mutual coherence function

The resultant irradiance at P , averaged over some time interval which is long compared tothe coherence time, is:

I = E{EP (t)E∗

P (t)}

(3.3)

32

Employing equation ( 3.2) this can be written as:

I = C1C∗1E

{E1(t− t1)E∗

1 (t− t1)}

+ C2C∗2E

{E2(t− t2)E∗

2 (t− t2)}

+ C1C∗2E

{E1(t− t1)E∗

2 (t− t2)}

+ C∗1 C2E

{E∗

1(t− t1)E2(t− t2)}

(3.4)

It is now assumed that the wave field is stationary, as is almost universally the case, i.e. thestatistical nature of the wave field does not alter with time and the time average is independentof whatever time origin we select (the Wide Sense Stationary situation as described in Chapter1). Accordingly, the first two expectation values in equation (3.4) can be rewritten as:

IS1 = E{E1(t)E∗

1(t)}

and IS2 = E{E2(t)E∗

2(t)}

(3.5)

where the time origin was displaced by amounts t1 and t2, respectively. The subscripts forthe irradiances used here underscore the fact that they refer to the values at points S1 andS2. Furthermore, if we introduce the time difference τ = t2 − t1, the time origin of the lasttwo terms can be shifted by an amount t2 yielding:

C1C∗2E

{E1(t+ τ)E∗

2 (t)}

+ C∗1C2E

{E∗

1(t+ τ)E2(t)}

(3.6)

This expression comprises a complex quantity plus its own complex conjugate and is thereforeequal to twice its Real part:

2 Re[C1C

∗2E

{E1(t+ τ)E∗

2(t)}]

(3.7)

As noted before, the C-coefficients are purely imaginary, i.e. C1C∗2 = C∗

1 C2 = |C1||C2|.The expectation value contained in expression (3.7) is a cross-correlation function, which isdenoted by:

Γ12(τ) = E{E1(t+ τ)E∗

2(t)}

(3.8)

This equation is referred to as the mutual coherence function of the wave field at positionsS1 and S2. Making use of the definitions above, equation (3.4) now takes the form:

I = |C1|2IS1 + |C2|2IS2 + 2|C1||C2| Re Γ12(τ) (3.9)

The terms |C1|2IS1 and |C2|2IS2 are the irradiance at P , arising when one or the other of theapertures is open alone: either C1 = 0 or C2 = 0. Denoting these irradiances as I1 and I2 wehave:

I = I1 + I2 + 2|C1||C2| Re Γ12(τ) (3.10)

If S1 and S2 are made to coincide, the mutual coherence function becomes the autocorrelationfunction:

Γ11(τ) = R1(τ) = E{E1(t+ τ)E∗

1(t)}

(3.11)

or:Γ22(τ) = R2(τ) = E

{E2(t+ τ)E∗

2(t)}

(3.12)

33

One can imagine that two wavetrains emerge from these coalesced pinholes and somehow pickup a relative phase delay τ . In the situation at hand τ = 0, since the optical path difference(shorthand: OPD) goes to zero. Hence:

IS1 = E{E1(t)E∗

1(t)}

= Γ11(0) = E{|E1(t)|2

}and

IS2 = E{E2(t)E∗

2(t)}

= Γ22(0) = E{|E2(t)|2

}(3.13)

These autocorrelation functions are also called self-coherence functions. For τ = 0 they repre-sent the (average) irradiance (power) of the radiation field at positions S1 and S2 respectively.

3.1.4 Interference law for a partially coherent radiation field: the complexdegree of coherence

From equation (3.10) and the selfcoherence functions we can now write:

|C1||C2| =√I1√I2√

Γ11(0)√

Γ22(0)(3.14)

Hence, the normalized expression for the mutual coherence function can now be defined as:

γ12(τ) ≡ Γ12(τ)√Γ11(0)Γ22(0)

=E{E1(t+ τ)E∗

2(t)}

√E{|E1(t)|2

}E{|E2(t)|2

} (3.15)

This quantity is referred to as the complex degree of coherence, equation (3.10) can nowbe recast into its final form:

I = I1 + I2 + 2√I1I2 Re γ12(τ) (3.16)

which is the general interference law for a partially coherent radiation field.The quantity γ12(τ) simultaneously gives a measure of the spatial coherence by comparison oftwo locations in space (S1 and S2 in the above case) and the coherence in the time domainby accounting for a time lag τ between both signals.γ12(τ) is a complex variable and can be written as:

γ12(τ) = |γ12(τ)|eiψ12(τ) (3.17)

From equation (3.15) and the Schwarz inequality it is clear that 0 ≤ |γ12(τ)| ≤ 1. The phaseangle ψ12(τ) of γ12(τ) relates to the phase angle between the fields at S1 and S2 and thephase angle difference concomitant with the OPD in P resulting in the time lag τ , as shownin equation (3.8). For quasi-monochromatic radiation at a mean wavelength λ and frequencyν, the phase difference φ due to the OPD can be expressed as:

φ =2πλ

(r2 − r1) = 2πντ (3.18)

If we designate a phase angle α12(τ) between the fields at S1 and S2, we haveψ12(τ) = [α12(τ) − φ)].Hence:

Re γ12(τ) = |γ12(τ)| cos [α12(τ) − φ] (3.19)

34

Figure 3.2: Partially coherent intensity fluctuations in a thermal radiation source. Bothabsolute values I and relative values �I = I − I are shown.

Substitution of this expression in the interference law for partially coherent radiation given inequation (3.16) yields for the intensity observed at point P on the observation plane ΣO:

I = I1 + I2 + 2√I1I2 |γ12(τ)| cos [α12(τ) − φ] (3.20)

The maximum and minimum values of I occur if the cosine term in equation (3.20) equals+1 and −1, respectively. The Visibility V (see definition (3.1)) at position P can thereforebe expressed as:

V =2√I1√I2

I1 + I2|γ12(τ)| (3.21)

In practice, frequently things are (or can be) adjusted in such a way that I1 = I2, giving riseto the following simplified expressions for the total irradiance I and Visibility V :

I = 2I0 {1 + |γ12(τ)| cos [α12(τ) − φ]} and V = |γ12(τ)| (3.22)

We note that in this case the modulus of the complex degree of coherence is identical to thevisibility of the fringes ! This then provides an experimental means of obtaining |γ12(τ)| fromthe resultant fringe pattern. Moreover, the off-axis shift in the location of the central fringe(no OPD → φ = 0) is a measure of α12(τ), the relative retardation in phase of the fields at S1

and S2. Thus, measurements of the visibility and the fringe position yield both the amplitudeand phase of the complex degree of coherence.

Note: Dealing with the definition of the complex degree of coherence, it was noted that theintensity fluctuations are also expected to be partially coherent, since the amplitude and phase

35

fluctuations in a thermal signal tend to track each other. An impression of such a fluctuatingwave signal is given in figure 3.2, both in absolute value and centered around the avarage valueI: �I = I − I. The random superposition of wave packets results in a Gaussian distributionof the amplitude fluctuations for one direction of polarisation and, consequently, is sometimesreferred to as Gaussian light. This fact directly follows from the central limit theorem. Takingthe cross-correlation E {�I1(t+ τ)�I2(t)} between two different parts of the incoming radi-ation beam yields now in principle an interferometric tool, which does not involve the usageof phase information. This technique, labelled as correlation or intensity-interferometry, canindeed be succesfully applied for very bright stellar sources to obtain accurate estimationsof stellar angular diameters. It was first tried in radio astronomy in the middle 1950’s, andlater on (1956) also succesfully used in an optical stellar interferometer by Hanbury Brownand Twiss. They employed search-light mirrors to collect starlight onto two apertures, whichwas then focussed on two photomultiplier devices. Although photomultipliers operate on thephoto-electric effect, which is keyed to the quantum nature of the optical light, laboratoryexperiments showed that the correlation effect was indeed preserved in the process of photo-electric emission.The star Sirius was the first to be examined, and it was found to possess an angular diameterof 6.9 milliseconds of arc. For certain stars angular diameters of as little as 0.5 millisecondsof arc can be measured in this way.

3.2 Indirect spectroscopy

3.2.1 Temporal coherence

Temporal coherence is characterised by the coherence time τc . The value of τc follows fromthe finite bandwidth of the radiation source under consideration. If we assume a quasi-monochromatic (QM) source, then we have τc ≈ 1

�ν with �ν the line width (in radiationfrequency) of the QM-source.These effects can be assessed with the aid of the Wiener-Khinchine theorem:

S(ν) =+∞∫

−∞R(τ)e−2πiντdτ (3.23)

R(τ) =+∞∫

−∞S(ν)e2πiντdν (3.24)

Take as an example a Gaussian shaped spectral line profile, i.e.

S(ν) ∼ e−(

ν�ν

)2⇐⇒ R(τ) ∼ e−

(ττc

)2(3.25)

As can be seen from the FT (indicated by ⇐⇒ in expression( 3.25), the wave packet cor-responding to this line profile has an autocorrelation function that is also Gaussian with acharacteristic width τc, moreover the autocorrelation R(τ) equals the autocovariance C(τ).This corresponds to a wave train with a Gaussian shaped envelope for the wave amplitude(see figure 3.3).Try to memorize the following notions:

• A first order system shows an exponential autocorrelation function R(τ) (see Chapter2, section 2.2.2).

36

Figure 3.3: A Gaussian shaped line profile of a quasi-monochromatic radiation source and theshape of the associated wavepacket.

• A Gaussian line profile in the frequency domain shows an amplitude modulated wavetrain with a Gaussian envelope in the time domain (see discussion above).

• A Lorentz line profile in the frequency domain shows an exponentially damped oscillatorprofile in the time domain (try this yourself).

For spectroscopic measurements at infrared and shorter wavelengths one can directly dispersethe incoming radiation beam with the aid of a wavelength dispersive device, like for instance atransmission or a reflection grating, and measure the resulting intensity distribution (i.e. thespectrum). However for spectroscopy at radio and submillimeter wavelengths one employs anindirect method. The incoming wave signal is fed into a correlator that produces the temporalcoherence function R(τ), a subsequent FT of this function yields the spectral distribution S(ν)by virtue of the Wiener-Khinchine relation.

3.2.2 Longitudinal correlation

Associated with the coherence time τc is the so-called coherence length lc = cτc.

Problem: Show that the coherence length can also be expressed as lc = λ2

�λ in which �λrefers to the equivalent of �ν in the wavelength domain.

Now consider an EM-wave that propagates along a vector �r, and mark two positions P1

and P2 on this line of propagation at a mutual distance R12. If R12 � lc, there will bea strong correlation between the EM-fields at P1 and P2 and as a consequence interferenceeffects will be possible. In the case of R12 lc, no interference effects are possible. Thiseffect (i.e. potential interference yes or no) relates to the so-called longitudinal correlation or

37

Figure 3.4: The influence of the coherence length on the interference pattern of two diffractedcoherent thermal radiation sources S1 and S2.

longitudinal spatial coherence.This effect can be clearly demonstrated by considering the wave trains in Young’s interfer-ence experiment (see figure 3.4 ). The diffracted beams emanating from S1 and S2, which arecoherent radiation sources, cause an interference pattern. However, in the case of large pathdifferences the interference contrast will diminish, since corresponding wave packets in the

38

Figure 3.5: Interferogram displaying the idealized irradiance as a function of the y-coordinateof the fringes in the observation plane ΣO.

stochastic signal no longer overlap (see figure 3.4: packet H1 and H2 =⇒ packet I1 and H2).

3.3 Indirect Imaging

3.3.1 Quasi-monochromatic point source: spatial response function of atwo-element interferometer

If we apply equation (3.22) to the situation of a quasi-monochromatic point source S, locatedon the central axis, the wavelets emanating from two infinitesimal pinholes S1 and S2 (fictitiouscase!) will be fully coherent and exactly in phase (α12(τ) = 0) and constitute two coherentsecondary sources. Interference will occur, provided the OPD between the interfering beamsis less than the coherence length l = cτc. With V = |γ12(τ)| = 1, the equation for the totalirradiance in (3.22) reduces to:

I = 2I0(1 + cosφ) = 4I0 cos2φ

2(3.26)

Taking a distance a between the pinholes and assuming that the distance s to the observationplane ΣO is very much larger than a, we can express the path difference (r2 − r1) in equation(3.18) for φ in good approximation by:

r2 − r1 = aθ =a

sy (3.27)

Here y is the linear coordinate in the observation plane ΣO, starting from the intersectionof the central axis with this plane and running perpendicular to the fringes. Substitutingφ in equation (3.26) by combining (3.18) and (3.27) we get an analytical expression for theinterferogram in ΣO:

I = 4I0 cos2πay

sλ(3.28)

This (idealized) irradiance versus distance distribution is displayed in figure 3.5 and consti-tutes basically the response of an ideal two-element interferometer to a monochromatic pointsource, i.e. the PSF of an ideal two-element interferometer.

39

The derivation of the actual PSF for a non-ideal two-element interferometer, that also ac-counts for the finite size of the apertures, can be accomplished by utilizing the concept of thepupil function which was treated for single apertures in Chapter 7 (on Imaging in Astronomy)of the OAF1 course. This concept will now first be applied to a two-element interferometer,starting with the assumption of infinitesimal apertures (the ideal case discussed above) andsubsequently by implementing the practical case with finite aperture sizes. Later on in thischapter, with the introduction of aperture synthesis, this will be extended to the derivation ofthe point source response function (PSF) and the optical transfer function (OTF) for multi-aperture arrays.

Consider two circular apertures with diameter d separated by a baseline direction vector �s.Take the origin of the pupil function P (�r) on the baseline vector �s symmetrically positionedbetween the two apertures. The pupil function P (�r) can now be expressed as:

P (�r) = Π(�r − �s/2

d

)+ Π

(�r + �s/2

d

)(3.29)

with Π the 2-dimensional circular window function.Introducing the spatial frequency variable �ζ = �r/λ, the pupil function can be rewrittenas:

P (�ζ) = Π

(�ζ − �s/2λd/λ

)+ Π

(�ζ + �s/2λd/λ

)(3.30)

Now if the diameter d of the apertures is assumed to be very much smaller than the length|�s| = D of the baseline vector, i.e. d� D, the pupil function can be approximated by:

P (�ζ) = δ(�ζ − �s/2λ

)+ δ

(�ζ + �s/2λ

)(3.31)

The optical tranfer function (OTF) for this limiting case of infinitesimal apertures, followsfrom the self-convolution of the the function (λ/R)P (�ζ). Since we have a symmetrical pupilfunction in this case, the self-convolution is identical to the autocorrelation of (λ/R)P (�ζ):

OTF = Hλ(�ζ) =(λ

R

)2 ∫ ∫pupil plane

P (�ζ ′)P (�ζ ′ − �ζ)d�ζ ′

=(λ

R

)2 ∫ ∫pupil plane

[δ(�ζ ′ − �s/2λ

)+ δ

(�ζ ′ + �s/2λ

)]·

·[δ(�ζ ′ − �ζ − �s/2λ

)+ δ

(�ζ ′ − �ζ + �s/2λ

)]d�ζ ′

= 2(λ

R

)2 [δ(�ζ ) +

12δ(�ζ − �s/λ

)+

12δ(�ζ + �s/λ

)](3.32)

This OTF shows that the pair of pinholes transmits three principal spatial frequencies: aDC-component δ(�0) and two high frequencies related to the length of the baseline vector�s at ±�s/λ. These three spatial frequencies represent a three-point sampling of the so-calleduv-plane in 2-dimensional frequency(Fourier) space. Note: Full frequency sampling of the uv-plane will allow complete reconstruction of the brightness distribution of the celestial sourcebeing observed!The PSF follows from FT{Hλ(�ζ)} :

δ(�ζ ) ⇔ 1 (2-dimensional sheet!)

40

δ(�ζ − �s/λ

)⇔ ei2π

θ·s/λ

δ(�ζ + �s/λ

)⇔ e−i2πθ·s/λ

=⇒ PSF =(λ

R

)2 [2(1 + cos 2π�θ · �s/λ)

]= 4

(λ

R

)2

cos2 π�θ · �s/λ (3.33)

in which �θ represents the 2-dimensional angular coordinate vector used to describethe angular distribution of the diffracted image. The attenuation factor (λ/R)2 results fromthe spherical expansion of the diffracted field in the Fraunhofer limit.If we assume a light flux I0 emanating from each pinhole, we can express the brightnessdistribution for the diffracted beam as:

I = 4I0 cos2 π�θ · �s/λ (3.34)

This is the same equation we derived before with the aid of the interference law for partiallycoherent light, but now in a two-dimensional setting with �θ replacing y/s and �s replacing thepinhole distance a in equation ( 3.28). We have full constructive interference:

I = 4I0 for�θ · �sλ

= n (= 0,±1,±2, . . .) → |�θ| =nλ

|�s| cos φ (3.35)

and full destructive interference:

I = 0 for�θ · �sλ

= (n+12) (= ±1

2,±3

2,±5

2, . . .) → |�θ| =

(n+ 12)λ

|�s| cosφ (3.36)

with cosφ the angle between �θ and the baseline vector �s.The PSF represents a corrugated sheet with its modulation along the direction of the baselinevector �s and a periodicity (Δθ)s = λ/|�s|, i.e. a pattern of alternating bright and dark stripesorthogonal to the direction of the baseline vector �s.In actuality, the apertures have a finite size and the diffracted light by the apertures islocalized, so we have d < S but the approximation by δ-functions for infinitesimal aperturesno longer holds! The PSF has now to be derived from the amplitude of the diffracted field byFT of the pupil function given in expression 3.30:

a(�θ) ⇔ λ

R

[Π{λ

d

(�ζ − �s

2λ

)}+ Π

{λ

d

(�ζ +

�s

2λ

)}](3.37)

Applying the shift and scaling theorems from Fourier theory, i.e. if f(x) ⇔ F (s) then f [a(x−b)] ⇔

(e−2πibs/a

)F (s/a), we find for the amplitude of the diffracted field:

a(| �θ |) =(λ

R

)[14π (d/λ)2

] [2J1(π| �θ |d/λ)

π| �θ |d/λ

](e−2πiθ·s/2λ + e2πi

θ·s/2λ)

= 2(λ

R

) [14π (d/λ)2

] [2J1(|�u |)

|�u |]cos �u · �s/d with �u = π �θ d/λ

PSF = |a(|�θ|)|2 = 4(λ

R

)2 [14π (d/λ)2

]2 [2J1(|�u |)|�u |

]2cos2 �u · �s/d (3.38)

Again we have full constructive interference for:

�θ · �sλ

= n (= 0,±1,±2, . . .) → |�θ| =nλ

|�s| cosφ (3.39)

41

Figure 3.6: PSF of a single circular aperture in pseudo-color as a function of the 2D-positionvector �u (λ-invariant display).

and full destructive interference for:

�θ · �sλ

= (n+12) (= ±1

2,±3

2,±5

2, . . .) → |�θ| =

(n + 12 )λ

|�s| cosφ (3.40)

with cosφ the angle between �θ and the baseline vector �s.The first two terms in the expression for the PSF give the normalisation for |�θ | = 0. The otherterms represent a corrugated 2-dimensional Airy brightness distribution, intensity-modulatedalong the direction of the baseline vector �s with periodicity (Δθ)s = λ/|�s|, i.e. a pattern of al-ternating bright and dark annuli at a pitch determined by (Δθ)d = 1.220λ/d, 2.233λ/d, 3.238λ/d, ...of the individual telescope mirrors as shown in figure 3.6 in pseudo-color superimposed byperiodic drop-outs in brightness orthogonal to the direction of the baseline vector �s at a pitchdetermined by (Δθ)s = λ/|�s |. This corrugated 2-dimensional Airy brightness distribution isalso displayed (λ-invariant) in pseudo-color code as a function of the 2-D position vector �u infigure 3.7A typical one-dimensional cross-section along uy = 0 of the central part of the interferogram

3.7 is sketched in figure 3.8. Note that the visibilities in both figure 3.5 and in figure 3.8 areequal to one, because Imin = 0.It can be shown that |γ12(τ)| equals one for all values of τ and any pair of spatial points, if and

42

Figure 3.7: PSF of a two-element interferometer in pseudo-color as a function of the 2D-position vector �u (λ-invariant display). The aperture diameter d equals 25 meters, the lengthof the baseline vector |�s| is chosen to be 144 meter.

Figure 3.8: Double beam interference fringes showing the modulation effect of diffraction bythe aperture of a single pinhole .

43

Figure 3.9: Relating γ12(0) to the brightness distribution of an extended radiation source S:configuration for demonstrating the Van Cittert-Zernike theorem.

only if the radiation field is strictly monochromatic, in practice such a situation is obviouslyunattainable ! Furthermore, a non-zero radiation field for which |γ12(τ)| = 0 for all values ofτ and any pair of spatial points cannot exist in free space either.

3.3.2 Quasi-monochromatic extended source: spatial or lateral coherence

Spatial coherence (also: lateral coherence or lateral correlation) has to do with the spatialextent of the radiation source.

Problem: Prove that for τ � τc:

γ12(τ) = γ12(0)e2πiν0τ (3.41)

with |γ12(τ)| = |γ12(0)| and a fixed phase difference α12(τ) = 2πν0τ , ν0 represents the averagefrequency of the wave carrier.

In the following treatment of spatial coherence, it is implicitly assumed that the frequencybandwidth of the radiation source is suffiently narrow that the comparison between two pointswith respect to spatial coherence occurs at times differing by �t� τcQuery What is the quantitative relation between the brightness distribution of the spatiallyextended radiation source and the resulting phase correlation between two positions in theradiation field?

Approach Consider again Young’s experiment for the case that the radiation source Sis extended and illuminates the pinholes S1 and S2 (actually shown in figure 3.1). Inthe observers plane Σ, the interference is given by the expectation value of the productE1(t)E∗

2 (t) = E{E1(t)E∗2(t)} = Γ12(0) with the subscripts 1 and 2 referring to the posi-

tions P1 and P2 in the Σ-plane. If E1 and E2 are uncorrelated, then |Γ12(0)| = 0. In the case

44

of full correlation |γ12|(= |Γ12(0)|√

I1I2

)= 1, for partial correlation we have 0 < |γ12(0)| < 1.

The extended source in Young’s experiment is a collection of non-coherent infinitesimal radi-ators, this obviously reduces the contrast in the interferogram. This contrast can be observedand is described by the afore mentioned Visibility function V:

V =Imax − IminImax + Imin

= |γ12(0)| (3.42)

3.3.3 The Van Cittert-Zernike theorem

So how can we now relate γ12(0) (or Γ12(0)) to the brightness distribution of the extendedradiation source S?This can be done in the following way (see figure 3.9). Locate S, a QM-incoherent source, in aplane σ, with an intensity distribution I(y, z). Consider next the observation plane Σ parallelto σ, l is perpendicular to both planes (coincident with the X-axis) connecting the centre ofthe extended source (y = 0, z = 0) to the zero reference in Σ (Y = 0, Z = 0). Select twopositions P1 and P2. The objective is to describe the value of γ12(0) in this plane, i.e. thecoherence of the radiation field in Σ. Consider furthermore a small (infinitesimal) radiationelement dS in the source at distances R1 and R2 from P1 and P2 respectivily. Suppose nowthat S is not a source but an aperture of identical size and shape, and suppose that I(y, z) isnot a description of the irradiance (or intensity distribution) but, instead, its functional formcorresponds to the field distribution across that aperture. In other words imagine that thereis a transparancy at the aperture with amplitude transmission characteristics that correspondfunctionally to the irradiance distribution I(y, z). Furthermore, imagine that the aperture isilluminated by a spherical wave converging towards the fixed point P2, so that a diffractionpattern will result centered at P2. This diffracted field distribution, normalised to unity atP2, is everywhere (e.g. at P1) equal to the value of γ12(0) at that point. This is the VanCittert-Zernike theorem.In the limit that R1 and R2 are much larger than the source diameter and the relevant partof the Σ-plane we have the equivalent of Fraunhofer diffraction, this condition is practicallyalways satisfied for astronomical observations. In that case, the van Cittert-Zernike theoremcan be expressed mathematically as:

Γ(�r) =∫ ∫

source

I(�Ω)e2πi�Ω.�r

λ d�Ω (3.43)

I(�Ω) = λ−2∫ ∫

Σ-plane

Γ(�r)e−2πi�Ω.�r

λ d�r (3.44)

I(�Ω) is the intensity distribution of the extended radiation source as a function of a unitdirection vector �Ω as seen from the observation plane Σ. Taking the centre of the extendedradiation source S as the zero-reference for �Ω (coincident with the central axis l in figure ( 3.9))and assuming a relativily small angular extent of the source we can write I(�Ω) = I(θy, θz)and d�Ω = dθydθz, where θy and θz represent two orthogonal angular coordinate axes acrossthe source starting from the zero-reference of �Ω.Γ(�r) is the coherence function in the Σ-plane, the vector �r represents an arbitrary baseline�r(X,Y ) in this plane with d�r = dY dZ (in the above example P1P2 = �rP1 − �rP2).Expressions ( 3.43) and ( 3.44) for Γ(�r) and I(�Ω) show that they are linked through a Fouriertransform, except for the scaling with the wavelength λ. This scaling might be perceived as a”true” Fourier transform with the conjugate variables �Ω and �r/λ, i.e. by expressing �r in units

45

of the wavelength λ, writing the van Cittert-Zernike theorem as the Fourier pair:

I(�Ω) ⇔ Γ(�r/λ) (3.45)

The complex spatial degree of coherence, γ(�r), follows from:

γ(�r) =Γ(�r)∫ ∫

sourceI(�Ω)d�Ω

(3.46)

i.e. by normalising on the total source intensity.Note: Although the extended source S is spatially incoherent, there still exists a partiallycorrelated radiation field at e.g. positions P1 and P2, since all individual source elementscontribute to a specific location P in the Σ-plane.

For a derivation of the Van Cittert-Zernike relations, consider the geometry given in fig-ure 3.10.The observation plane Σ contains the baseline vector �r(Y,Z) and is perpendicular to thevector pointing at the centre of the radiation source. The angular coordinates θy and θzacross the source (see above) correspond to the linear coordinates of the unit direction vec-tor �Ω(ΩX ,ΩY ,ΩZ), i.e. the direction cosines of �Ω relative to the X,Y,Z coordinate system(Ω2

X + Ω2Y + Ω2

Z = 1). The spatial coherence of the EM -field between the two positions 1 (forconvenience chosen in the origin) and 2 is the outcome of a correlator device that producesthe output E

{E1(t)E∗

2(t)}.

In reality positions 1 and 2 are not point like, they represent radio antennae or optical reflec-tors, we shall come back to this later. From the geometry displayed in figure 3.10, regardingthe Van Cittert-Zernike relations, we can note the following:⇒ If I(�Ω) = I0δ(�Ω), i.e. a point source on the X-axis, the Van Cittert-Zernike relation yields|Γ(�r)| = I0 and |γ(�r)| = 1: a plane wave hits the full Y Z-plane everywhere at the same time,full coherence is preserved (by definition) on a plane wave front.⇒ Next, let us consider an infinitesimal source element in the direction �Ω0 =⇒ I0δ(�Ω − �Ω0).The projection of �Ω0 on the Σ−plane is �Ω′

0(ΩY ,ΩZ). There will now be a difference in pathlength between positions 1 and 2 given by the projection of �r on �Ω0, i.e. �r.�Ω′

0 = ΩY Y + ΩZZ.Then:

E1(t) = E0(t)e2πiν0

(t+

�Ω0.�rc

)= E0(t)e

(2πiν0t+

2πi�Ω0.�rλ

)(3.47)

E∗2(t) = E0(t)e−2πiν0t (3.48)

Therefore:E{E1(t)E∗

2(t)}

= E{|E0(t)|2

}e

2πi�Ω0.�rλ = I0(�Ω0)e

2πi�Ω0.�rλ (3.49)

Integration over the full source extent(straight forward integration, since all source elementsare spatially uncorrelated) yields:

Γ(�r) =∫ ∫

source

I0(�Ω)e2πiΩ.r/λd�Ω (3.50)

γ(�r) =∫ ∫

sourceI0(�Ω)e2πiΩ.r/λd�Ω∫ ∫

sourceI0(�Ω)d�Ω

(3.51)

The meaning of this relationship is, in physical terms, that Γ(�r) at a certain point representsa single Fourier component (with baseline �r) of the intensity distribution of the source with

46

Figure 3.10: Van Cittert Zernike relation: reference geometry.

strength Γ(�r)d�r. A short baseline (small |�r|) corresponds to a low frequency (spatial frequency!)component in the brightness distribution I(θy, θz), i.e. coarse structure, large values of |�r|correspond to fine structure in I(θy, θz). The diffraction limited resolution in aperture synthesiscorresponds to:

|�rmax| = Lmax =⇒ θmin =λ

2Lmax(3.52)

The factor 2 in the denominator of the expression for θmin follows from the rotation symmetryin aperture synthesis.

Example Consider a one dimensional case. This can be done by taking the slit source ofuniform intensity shown in figure 3.11 , slit width b and running coordinate ξ, the observationplane Σ, running coordinate y, is located at large distance l from the slit source (i.e. theFraunhofer limit is applicable). The source function can be expressed as the window function

47

Figure 3.11: The coherence function γ12(0) for a uniform slit source.

Π(ξb

), in angular equivalent Π

(ββ0

), with β0 = b/l.

Application of the Van Cittert-Zernike theorem I(�Ω) ⇔ Γ(�r/λ) yields:

Π(β

β0

)⇔ β0sinc

(yβ0

λ

)= β0sinc

(yb

λl

)(3.53)

with sinc(x) =(

sinπxπx

). The modulus of the normalised complex coherence function becomes:

|γ(y)| =

∣∣∣∣∣β0sincybλlβ0

∣∣∣∣∣ =∣∣∣∣sinc

yb

λl

∣∣∣∣ = V ⇒ Visibility (3.54)

Note that:

• Enlarging b with a factor 2, shrinks the coherence function with the same factor. Thisis of course a direct consequence of the scaling law under Fourier transform.

• The width of the coherence function follows from:(yβ0

λ

)≈ 1 ⇒ y =

(λβ0

). If the radi-

ation source exhibitis a smooth brightness distribution over the angle β0 = Δ radians,as is the case with the slit source, then γ(y) also displays a smooth distribution over adistance of λ/Δ meters.

• If the brightness structure of a radiation source covers a wide range of angular scales,say from a largest angular scale Δ to a smallest angular scale δ (in radians), then thespatial coherence function shows a finest detail of λ/Δ and a maximum extent of ≈ λ/δin meters.

3.3.4 Etendue of coherence

Consider the two-dimensional case of a circular source of uniform intensity with an angu-lar diameter θs, the source brightness distribution can then be described as a circular two-dimensional window function: I(�Ω) = Π

(θθs

). To compute the complex degree of coherence

48

in the observation plane Σ take again two positions, position 1 in the centre (origin, as be-fore) and position 2 at a distance ρ from this centre point. Applying the van Cittert-Zerniketheorem, we find for Γ(ρ):

Π(θ

θs

)⇔ Γ(ρ/λ) =

(θs/2)J1(πθsρ/λ)ρ/λ

(3.55)

where J1 represents the Bessel function of the first kind. Normalisation to the source bright-ness, through division by (πθ2

s)/4, yields the expression for the complex degree of coherence:

γ(ρ) =2J1(πθsρ/λ)πθsρ/λ

(3.56)

The modulus of the complex degree of coherence is therefore:

|γ(ρ)| =∣∣∣∣2J1(u)

u

∣∣∣∣ (3.57)

with the argument of the Bessel function u = πθsρ/λ. We can derive the extent of thecoherence in the observation plane Σ by evaluating J1(u). If we take u = 2, |γ(ρ)| = J1(2) =0.577, i.e. the coherence in Σ remains significant for u ≤ 2, or ρ ≤ 2λ/(πθs). The area S inΣ over which the coherence remains significant equals πρ2 = 4λ2/(πθ2

s). In this expression,πθ2

s/4 equals the solid angle Ωsource subtended by the radiation source. Significant coherencewill thus exist if the following condition is satisfied:

ε = SΩsource ≤ λ2 (3.58)

The condition ε = SΩsource = λ2 is called the Etendue of Coherence, to be fulfilled if coherentdetection is required!

Example Consider a red giant star, of radius r0 = 1.5 x 1011 meter, at a distance of 10 par-sec. For this object θs = 10−6 radians. If this object is observed at λ = 0.5μm, the value of thecoherence radius ρ, on earth, on a screen normal to the incident beam is ρ = 2λ/(πθs) = 32cm. In the infrared, at λ = 25μm, the radius ρ is increased fifty fold to ≈ 15 meter. In theradio domain, say at λ = 6 cm, ρ ≈ 35 km.

In general, good coherence means a Visibility of 0.88 or better. For a uniform circular sourcethis occurs for u = 1, that is when ρ = 0.32λ/θ. If we consider a narrow-bandwidth uniformradiation source at a distance R away, we have

ρ = 0.32(λR)/D (3.59)

This expression is very convenient to quickly estimate the required physical parameters inan interference or diffraction experiment. For example, if we put a red filter over a 1-mm-diameter disk-shaped flashlight source and stand back 20 meters from it, then ρ = 3.8 mm,where the mean wavelength is taken at 600 nm. This means that a set of apertures spacedat about 4 mm or less should produce clear fringes. Evidently the area of coherence increaseswith the distance R, this is why one can always find a distant bright street light to use as aconvenient source.Important: Remember, as stated in section 1.3.1, that throughout the treatment of spatialcoherence it was assumed that the comparison between the two points occurs at times differingby a �t� τc. If this condition is not fulfilled, for example because the frequency bandwidth ofthe radiation source is too large, interferometric measurements will not be possible (see sectionon temporal coherence). Frequency filtering will then be required to reduce the bandwidth ofthe source signal, i.e. make it more quasi-monochromatic.

49

3.4 Aperture synthesis

As already elaborated in the section on indirect imaging, the positions 1 and 2 in the obser-vation plane Σ are in practise not pointlike, but encompass a finite aperture in the form of atelescope element of finite size, say a radio dish with diameter D. This dish has then a diffrac-tion sized beam of λ/D. In that case the Van Cittert-Zernike relation needs to be ”weighted”with the telescope element (single dish) transfer function H(�Ω). For a circular dish antennaH(�Ω) is the Airy brightness function, well known from the diffraction of a circular apertureand detailed in a previous section. The Van Cittert-Zernike relations now become:

Γ′(�r) =∫ ∫

source

I(�Ω)H(�Ω)e2πi�Ω.�r

λ d�Ω (3.60)

I(�Ω)H(�Ω) = λ−2∫ ∫

Σ-plane

Γ′(�r)e−2πi�Ω.�r

λ d�r (3.61)

The field of view scales with λ/D, e.g. if λ decreases the synthesis resolution improves butthe field of view reduces proportionally!So, in aperture synthesis the incoming beams from antenna dish 1 and antenna dish 2 arefed into a correlator (multiplier) that produces as output the product E1(t)E∗

2(t). Thisoutput is subsequently fed into an integrator/averager that produces the expectation valueE{E1(t)E∗

2(t)}

= Γ′(�r). By applying the Fourier transform given in ( 3.61), and correcting

for the beam profile of the single dish H(�Ω), the source brightness distribution I(�Ω) can bereconstructed.Important: Indirect imaging with an aperture synthesis system is limited to measuringimage details within the single pixel defined by the beam profile of an individual telescopeelement, i.e. a single dish!

3.4.1 Quasi-monochromatic point source: spatial response function (PSF)and optical tranfer function (OTF) of a multi-element interferometer

The pupil function of a linear array comprising N circular apertures with diameter d, alignedalong a baseline direction �b (unit vector) and equally spaced at a distance |�s| = �b · �s can bewritten as:

P (�ζ) =

[Π

(λ�ζ

d

)+ Π

{λ

d

(�ζ − �s

λ

)}+ Π

{λ

d

(�ζ − 2 · �s

λ

)}+ . . .

. . . + Π{λ

d

(�ζ − (N − 1) · �s

λ

)}=

N−1∑n=0

Π{λ

d

(�ζ − n · �s

λ

)}](3.62)

Applying the shift and scaling theorems from Fourier theory, i.e. if f(x) ⇔ F (s) then f [a(x−b)] ⇔

(e−2πibs/a

)F (s/a), we find for the amplitude of the diffracted field:

a(| �θ |) =(λ

R

) [14π(d/λ)2

] [2J1(π| �θ |d/λ)

π| �θ |d/λ

]·[1 + e−i(2πθ·s/λ) +

(e−i(2πθ·s/λ)

)2+ . . .

50

Figure 3.12: PSF of a 10-element interferometer with circular apertures in pseudo-color as afunction of the 2D-position vector �u (λ-invariant display).

. . .+(e−i(2πθ·s/λ)

)N−1]

=(λ

R

)[14π(d/λ)2

] [2J1(π| �θ |d/λ)

π| �θ |d/λ

]N−1∑n=0

(e−i(2πθ·s/λ)

)n(3.63)

The sum of the geometric series of N complex exponentials, accomodating the accumulatingphase shifts introduced by the addition of each subsequent telescope element equals:

N−1∑n=0

(e−i(2πθ·s/λ)

)n=

e−iN(2πθ·s/λ) − 1

e−i(2πθ·s/λ) − 1

=e−iN(2πθ·s/2λ)

e−i(2πθ·s/2λ)

(e−iN(2πθ·s/2λ) − eiN(2πθ·s/2λ)

)(e−i(2πθ·s/2λ) − ei(2πθ·s/2λ)

) = e−i(N−1)πθ·s/λ[sinN(π�θ · �s/λ)

sin(π�θ · �s/λ)

](3.64)

The PSF for the diffracted intensity distribution follows from the relationPSF = a(|�θ|) · a∗(|�θ|):

PSF =(λ

R

)2 [14π (d/λ)2

]2 [2J1(|�u |)|�u |

]2 sin2N(�u · �s/d)sin2(�u · �s/d) with �u = π�θ d/λ (3.65)

51

Figure 3.13: Magnification of the central part of figure 3.12, clearly showing the locations ofthe subsidiary maxima and minima.

For N = 1, we recover the Airy brightness function for a single circular aperture, for N = 2(Michelson) we have sin2N(�u · �s/d)/ sin2(�u · �s/d) = [2 sin(�u · �s/d) cos(�u · �s/d)]2 /sin2(�u · �s/d) = 4 cos2(�u · �s/d), commensurate with expression (3.38).For N apertures, maximum constructive interference occurs for sinN(π�θ · �s/λ)/sin(π�θ · �s/λ) = N , that is when:

�θ · �sλ

= n (= 0,±1,±2, . . .) → |�θ| =nλ

|�s| cosφ (3.66)

These so-called principal maxima are apparently found at the same |�θ|-locations, regardlessof the value of N ≥ 2.Minima, of zero flux density, exist whenever sinN(π�θ · �s/λ)/ sin(π�θ · �s/λ) = 0, i.e. for:

�θ · �sλ

= ± 1N,± 2

N,± 3

N, . . . ,±N − 1

N,±N + 1

N, . . .

⇒ |�θ| =nλ

N |�s| cos φ, for n = ±1,±2, . . . but n �= kN (k = 0,±1,±2, . . . (3.67)

with cosφ the angle between �θ and the baseline vector �s.Between consecutive principal maxima there will therefore be N-1 minima. Since betweeneach pair of minima there will have to be a subsidiary maximum, i.e. a total of N-2subsidiary maxima between consecutive principal maxima. The first two terms in theexpression for the PSF give the normalisation for |�θ | = 0. The other terms represent acorrugated 2-dimensional Airy brightness distribution, intensity-modulated along the directionof the baseline vector �s with a periodicity (Δθ)s = λ/|�s| of narrow bright principal maxima

52

Figure 3.14: Cross-section of the central part of the PSF for a 10-element interferometer,delineating the brightness profiles of the principal maxima and the subsidiary maxima andminima.

and with a periodicity (Δθ)Ns = λ/(N |�s|) of narrow weak subsidiary maxima, interleavedwith zero-intensity minima.The corrugated 2-dimensional Airy brightness distribution for N = 10, yet again for d = 25meters and |�s| = 144 meters, is displayed (λ-invariant) in pseudo-color code as a functionof the 2D-position vector �u in figure 3.12. Also shown, in figure 3.13, is a magnification ofthe central part of the PSF that more clearly shows the interleaved subsidiary maxima andminima. A cross-section for uy = 0 of the central part of the interferogram, delineating theprofiles of the sharply peaked principal maxima and the adjacent series of subsidiary maximaand minima, is presented in figure 3.14.

The OTF for an array of N circular apertures can be obtained from the autocorrelation of thepupil function given in expression (3.62):

Hλ(�ζ ,N�s/λ) =(λ

R

)2[N−1∑n=0

Π{λ

d

(�ζ − n · �s

λ

)}]∗[N−1∑m=0

Π{λ

d

(�ζ −m · �s

λ

)}]

53

=(λ

R

)2 N−1∑n=0

N−1∑m=0

Anm(�ζ , �s/λ) (3.68)

in which Anm represents an element of the N xN autocorrelation matrix A, that is given by:

Anm(�ζ ,�s/λ) =∫ ∫

pupil planeΠ{λ

d

(�ζ

′ − n · �sλ

)}Π{λ

d

(�ζ

′ − �ζ −m · �sλ

)}d�ζ

′(3.69)

Values Anm �= 0 are given by Chinese-hat functions as derived for a single circular aperturein Chapter 7 of the OAF1 course. However in the multi aperture case here, we have a seriesof principal maxima in the Hλ(�ζ ,�s/λ) plane ( = the uv-plane representing 2-dimensionalfrequency(Fourier) space) at spatial frequency values:

�ζmax = �ζ − k · �sλ

with k = n−m = 0,±1,±2, . . . ,±(N − 2),±(N − 1) (3.70)

Hence we can replace Anm by Ak, where the single index k refers to the diagonals of theautocorrelation matrix A: k = 0 refers to the main diagonal, k = ±1 to the two diagonalscontiguous to the main diagonal and so on. The analytical expression for the diagonal termsAk can now be computed in the same way as for a single circular aperture, however in thiscase we need vector notation:

Ak =12

(d

λ

)2

·

·⎡⎣arccos

(λ

d

∣∣∣∣ �ζ − k · �sλ

∣∣∣∣)−(λ

d

∣∣∣∣ �ζ − k · �sλ

∣∣∣∣)(

1 −(λ

d

∣∣∣∣ �ζ − k · �sλ

∣∣∣∣)2) 1

2

⎤⎦Π

(λ

2d

∣∣∣∣ �ζ − k · �sλ

∣∣∣∣)

(3.71)

which we rewrite in terms of Chinese-hat functions Ck(�ζ−k ·�s/λ) normalised to unit aperturearea:

Ak =14π

(d

λ

)2

Ck(�ζ − k · �s/λ) (3.72)

The sum over all elements of matrix A can now be obtained from:N−1∑n=0

N−1∑m=0

Anm ≡ Asa

N−1∑k=−(N−1)

(N − |k|)Ck(�ζ − k · �s/λ) (3.73)

Sum check:N−1∑

k=−(N−1)

(N − |k|) = N + 2N−1∑k=1

(N − k)

= N +12(N − 1) [2(2N − 2) − 2(N − 2)] = N +N2 −N = N2,

compliant with the required total number of matrix elements! The quantity Asa = 14π(d/λ)2

represents the geometrical area of a single telescope element.Hence, we arrive at the following expression for the OTF of the array:

Hλ(�ζ ,N�s/λ) =(λ

R

)2

NAsa

N−1∑k=−(N−1)

(N − |k|)N

Ck(�ζ − k · �s/λ) with (3.74)

Ck(�ζ − k · �s/λ) =

=2π

⎡⎣arccos

(λ

d

∣∣∣∣ �ζ − k · �sλ

∣∣∣∣)−(λ

d

∣∣∣∣ �ζ − k · �sλ

∣∣∣∣)(

1 −(λ

d

∣∣∣∣ �ζ − k · �sλ

∣∣∣∣)2) 1

2

⎤⎦Π

(λ

2d

∣∣∣∣ �ζ − k · �sλ

∣∣∣∣)

(3.75)

54

The geometrical term NAsa reflects the total geometrical collecting area of the telescope array(N times the area of a single aperture Asa) and the term (λ/R)2 accomodates the attenuationdue to the spherical expansion of the wave field.The OTF for the spatial frequency throughput of the aperture array is described by a lin-ear array of discrete spatial frequency domains in the uv-plane, characterised by Chinese-hatfunctions centered at zero frequency and at multiples of the baseline vector �s/λ, that specifiesthe equidistance (magnitude and direction) between adjacent apertures. The peak transferdeclines linearly with increasing mutual separation between aperture elements relative to thezero-frequency response, i.e. proportional to (N − |k|)/N for spatial frequencies centeredaround k · �s/λ. This can be easily explained by considering the monotonously decreasingnumber of aperture elements, and hence the associated array collecting power, involved atlarger baseline values. For the maximum available baseline (N − 1)�s/λ this throughput re-duction amounts to 1/N .

Intermezzo: Verify that Hλ(�ζ,N�s/λ) ⇔ PSF(N-apertures)!!

Proof: Application of the shift theorem yields for the Fourier transform of the single-aperture-weighted (Asa = 1

4π (d/λ)2) Chinese-hat function at baseline position k · �s/λ the followingexpression for its FT:

14π (d/λ)2Ck(�ζ − k · �s/λ) ⇔

[14π (d/λ)2

]2 [2J1(|�u |)|�u |

]2e−ik(2πθ·s/λ) (3.76)

Putting x ≡ 2π�θ · �s/λ, the Fourier transform of (3.74) can be expressed as:

FT[Hλ(�ζ,N�s/λ)

]=(λ

R

)2 [14π (d/λ)2

]2 [2J1(|�u |)|�u |

]2·

·[N

(1 +

N−1∑k=1

eikx +N−1∑k=1

e−ikx)

−(N−1∑k=0

keikx +N−1∑k=0

ke−ikx)]

(3.77)

The first two sums of (3.77)involve simple geometric series and can be easily calculated, i.e.:

N−1∑k=1

eikx =eix

(ei(N−1)x − 1

)eix − 1

=ei(N− 1

2)x − eix/2

2i sin (x/2)and:

N−1∑k=1

e−ikx =e−ix

(e−i(N−1)x − 1

)e−ix − 1

= −(e−i(N− 1

2)x − e−ix/2

2i sin (x/2)

)

Hence, after some rearrangement of the complex exponentials, this results in a real function:

N

(1 +

N−1∑k=1

eikx +N−1∑k=1

e−ikx)

= Nsin [(2N − 1)(x/2)]

sin (x/2)(3.78)

The third and the fourth sum in equation (3.77) involve a combination of an arithmetic and ageometric series (a so-called arithmetico-geometric series). These sums can be simply derivedby differentation of the expression for the sum of the geometric series, resulting in:

N−1∑k=0

keikx =Ne i(N−1)x − (N − 1) e iNx − 1

4 sin2(x/2)and:

N−1∑k=0

ke−ikx =Ne−i(N−1)x − (N − 1) e−iNx − 1

4 sin2(x/2)

55

Straightforward recombination of the complex exponentials again results in a (trigonometric)real function:

N−1∑k=0

keikx +N−1∑k=0

ke−ikx =N cos(N − 1)x− (N − 1) cosNx− 1

2 sin2(x/2)(3.79)

Combining expressions (3.78) and (3.79), we arrive at :[N

(1 +

N−1∑k=1

eikx +N−1∑k=1

e−ikx)

−(N−1∑k=0

keikx +N−1∑k=0

ke−ikx)]

=

=2N sin(x/2) sin(N − 1

2)x−N cos(N − 1)x+ (N − 1) cosNx+ 12 sin2(x/2)

=

=N sinx sinNx + N cos x cosNx − cosNx − N cos x cosNx − N sinx sinNx + 1

2 sin2(x/2)=

=1 − cosNx2 sin2(x/2)

=2 sin2(Nx/2)2 sin2(x/2)

(3.80)

Substituting x ≡ 2π�θ · �s/λ ≡ 2(�u · �s/d), yields:

FT[Hλ(�ζ,N�s/λ)

]=(λ

R

)2 [14π (d/λ)2

]2 [2J1(|�u |)|�u |

]2 sin2N(�u · �s/d)sin2(�u · �s/d) Q.E.D. (3.81)

Note: The modulation term sin2N(�u · �s/d)/ sin2(�u · �s/d) can be simplified for low N-values:

= 1 for N=1= 4 cos2(�u · �s/d) for N=2

= [2 cos 2(�u · �s/d) + 1]2 for N=3= 8 cos2 2(�u · �s/d) · [1 + cos 2(�u · �s/d)] for N=4 (3.82)

End Intermezzo

3.4.2 Earth Rotation Aperture Synthesis (ERAS)

By employing the rotation of the earth, the baseline vectors k · �s/λ of the linear N-elementinterferometer array, as defined in the previous section, will scan the YZ-plane displayed infigure (3.10) in case the X-axis is lined up with the North polar axis, i.e. in this particulargeometry the X-coordinate of the baseline vectors is zero. The principal maxima or ’gratinglobes’ in the PSF of a multiple aperture array, as computed in the preceding section, will nowmanifest themselves as concentric annuli around a central source peak at angular distancesk ·λ/|�s |. If the circular scans in the YZ-plane are too widely spaced, i.e. if |�s | is larger than thesingle dish diameter, the (2-dimensional) Nyquist criterion is not respected and undersamplingof the spatial frequency uv-plane (=YZ-plane) occurs. Consequently, the grating lobes willshow up within the field of view defined by the single-dish beam profile. This can be avoidedby decreasing the sampling distance |�s |. In the following section we shall now demonstratethese notions with the concrete configuration of the Westerbork Radio Synthesis Telescope.

56

Figure 3.15: Configuration of the Westerbork Radio Synthesis Telescope.

3.4.3 The Westerbork Radio Synthesis Telescope (WSRT)

The WSRT consists of 14 parabolic antenna’s, with single dish diameters D of 25 meter.They are accurately lined up along the East-West direction with an overall length of ≈ 2750meter. Ten antenna’s (numbers 0 thru 9) are fixed with a mutual distance of 144 meter. Theother four antenna’s (A thru D) can be moved collectively with respect to the fixed array,without altering their mutual distance, i.e. the variable distance between antenna’s 9 andA can be preselected as a suitable baseline increment |�s | = �L (Figure 3.15). These 14antenna’s comprise 40 simultaneously operating interferometers. Employing the rotation ofthe earth, the full antenna array is rotated in a plane containing Westerbork perpendicular tothe rotation axis of the earth. With reference to figure (3.10) this implies that in this geometrywe are limited to sources near the North polar axis, all single dishes are thus pointed in thatdirection. In practise, the standard distance between antenna’s 9 and A equals 72 meters.Consequently, after half a rotation of the earth the YZ-plane is covered with 38 concentricsemi-circles with radii ranging from Lmin = 72 meters to Lmax = 2736 meters, with incrementsof �L = 72 meters. The WSRT-correlators integrate and average over 10 seconds, this impliessampling of the concentric semi-circles every 1/24 degrees. After 12 hours, half the YZ-planehas been covered. The other half need not be covered, it can be found by mirroring the firsthalf since I(�Ω) is a real function.The brightness distribution I(�Ω) can now be reconstructed by Fourier inversion accordingto expression (3.44). Since we have only obtained samples of the spatial coherence functionΓ(�r), the integral of expression (3.44) is replaced by a sum. Moreover, one normally appliesin addition a weighting function to get a considerable reduction of the side lobes, this goesslightly at the expense of the ultimate angular resolution. This is expressed in terms of adegradation factor α ≥ 1. The simplest form of such a weighting function is a triangularshaped, circular symmetric, function (i.e. a cone), the attenuation effect on the side lobes iscalled apodisation. Leaving any constants aside for the moment , we obtain I(�Ω):

I(�Ω) =∑k

w(�rk)Γ(�rk)e−2πi�Ω.�rk

λ (3.83)

with w(�rk) the weighting (apodisation) function.Expression (3.83) yields a radio map on a discrete grid in �Ω-space. The cell size of this grid(pitch) is chosen in such a way that oversampling of the spatial resolution of the array isachieved, so that contour plots can be constructed.Note: The reconstructed I(�Ω) needs to be corrected for the single dish response functionH(�Ω), see equation (3.61).If we consider half an earth rotation, the sum in (3.83) involves ≈ 165.000 numbers (i.e.38 semi-circles, 12x60x6 correlator samples/semi-circle). This sum will have to be taken for

57

Figure 3.16: A reconstructed ”dirty” radio map showing central source peaks and grating lobesof increasing order.

roughly the same number of image points in �Ω-space. This task is accomplished by using theFast Fourier Transform algorithm.The Point Spread Function (PSF):

Taking a point source S as the quasi monochromatic radiation source, the spatial frequencyresponse function of the rotated interferometer array in the uv-plane can be obtained explicitlyfrom expression (3.74) for the OTF by implementing the geometry of concentric scans. In thatcase, owing to the circular symmetry, the vectorial expression Hλ(�ζ ,N�s/λ) can be replaced bya, radially symmetric, scalar expression Hλ(p ,N�L/λ), with the scalar p the (λ - normalised)spatial frequency variable (= | �ζ |) and �L the baseline increment of the array (= |�s |).Although an exact expression for the scalar function Hλ(p ,N�L/λ) would involve integrationalong one coordinate of the two-dimensional Chinese-hat function, a straightforward one-dimensional cross-section constitutes an adequate approximation. Hence we can write:

Hλ(p ,N�L/λ) =(λ

R

)2

NAsa

N−1∑k=−(N−1)

(N − |k|)N

Ck(p − k · �L/λ) (3.84)

58

The PSF derives from the Fourier transform of Hλ(p ,N�L/λ). Following the approachoutlined in the intermezzo above, we arrive at:

PSFERAS =(λ

R

)2 [14π (d/λ)2

]2 [2J1(u)u

]2 sin2N(u�L/D)sin2(u�L/D)

(3.85)

where we have again utilised the reduced angular variable u = πθD/λ, θ represents the,radially symmetric, diffraction angle.Evaluation of expression (3.85) reveals two main image components:A Central Peak:This distribution is rather similar to the Airy brightness pattern, with a typical breadth, i.e.spatial resolution:

�θ =λ

2Lmaxradians (3.86)

with 2Lmax the maximum diameter of the array in the YZ-plane. In the derivation of (3.85),no weighting function was applied. If a weighting function is applied for efficient apodisation,expression (3.86) needs to be multiplied by a degradation factor α = 1 − 1.5 to accomodatethe loss in ultimate angular resolution. Moreover, moving outward, the sidelobes in (3.85)will progressively be reduced depending on the particular choice of the weighting function.Concentric Grating Lobes:The angular distances of these annuli from the central peak follow from the location of theprincipal maxima given by the modulation term sin2N(u�L/D)/ sin2(u�L/D) in expression(3.85). For an N-element array with increment �L, these angular positions are given by:

θgrating =λ

�L, 2λ

�L, ....., (N − 1)λ

�L (3.87)

A typical source field containing these grating lobes is shown in figure (3.16).It is clear from this figure that severe undersampling of the YZ-plane has occurred since

the grating lobes are well within the field of view. For the WSRT, one way to remedy theseimperfections in the PSF is by decreasing the distance between antenna’s 9 and A during thesecond half rotation of the earth. Combined with the first half rotation, a 36 meter incrementcoverage is achieved at the expense of doubling the observation time from 12 to 24 hours.In the same way, combining four half rotations in 48 hours, we can increase the coverage to18 meter increments. Since the single dish diameter D equals 25 meter, no empty space isnow left in the YZ-plane, the undersampling is corrected and the grating lobes have beenmoved outside the field of view defined by the single dish beam profile. This is summarizedin the following tables that show exposure times, maximum baselines and increments, and atfour different radio wavelengths single dish fields of view, central peak angular resolutions forα = 1.5, and angular distances of the grating lobes.

Exposure(hours) a (meter) Lmin (meter) �L (meter) Lmax (meter)1x12 72 72 72 27362x12 36,72 36 36 27364x12 36,54,72,90 36 18 2754

59

Figure 3.17: Position of the concentric grating rings in the PSF for various sampling densitiesof the uv-plane. FOV: −λ/D ≤ θx, y ≤ λ/D (−π ≤ ux, y ≤ π). Upper left: �L/D = 144/25,upper right: �L/D = 72/25, lower left: �L/D = 36/25, lower right: �L/D = 18/25.The insert in the lower right panel shows a blow-up of the central peak image, displaying thecircular subsidiary maxima and minima.

λ (cm) 6 21 49 92f (MHz) 5000 1420 612 326Single dish pixel λ/D (arcsec) 500 1800 4200 7600Resolution αλ/(2Lmax) (arcsec) 3 11.5 27 50Grating lobes λ/(�L = 72 m) (arcsec) 172 7600 1405 2640Grating lobes λ/(�L = 36 m) (arcsec) 344 1205 2810 5270Grating lobes λ/(�L = 18 m) (arcsec) 688 2410 5620 10540

60

Figure 3.18: Applying the CLEAN method for improving dirty radio maps. This is a radiomap taken at λ = 50 cm, with a �L = 36 meter increment. Left panel before CLEAN, rightpanel after CLEAN.

As is clear from these tables, the grating lobes can only be moved outside the single dish pixelif the empty spaces between the samplings are filled in, the table shows that this is obviouslythe case for �L = 18 m: for example the first order grating lobe at 6 centimeters is situatedat an angular distance of 688 arcseconds from the centre, whereas a single dish pixel λ/Dequals 500 arcseconds.Figure (3.17) stipulates the outward movement of the grating lobes for a WSRT-type config-uration by taking D = 25 meter and by reducing the baseline increment values from �L=144to 72, 36 and 18 meters respectively.The problem of incomplete coverage of the YZ-plane is that the values of the coherence func-tion Γ(�r) are set to zero in the empty spaces, which will certainly give an erroneous result.This can be circumvented by employing interpolation between the sampling circles based oncertain assumptions regarding the complexity (or rather simplicity) of the brightness distribu-tion of the sky region observed. Requirement is that the reconstructed (contour) map alwaysremains consistent with the measurements at the grid points! Two mathematical techniquesare often applied to get rid of the grating lobes: the CLEAN and MEM (Maximum EntropyMethod) techniques. Obviously this is potentially much more efficient than elongating theobservation times by a factor two or four, as was done in the above table. We shall not discussthese particular algorithms in any detail here, however the MEM technique will, in a moregeneral fashion, be treated later on in this course. The improvement that can be achieved byapplying the CLEAN method to a dirty radio map is displayed in figure (3.18), the left panelshows a dirty radio map at a wavelength of 50 cm, the right panel shows the ”cleaned” mapafter applying the CLEAN process.

3.4.4 Maximum allowable bandwidth of a quasi-monochromatic source

As was already stated, the coherence length of the radiation source needs to be larger than themaximum path length difference at the longest baseline present in the interferometer array.

61

This imposes a maximum frequency bandwidth for the observation of the radiation source,which is disadvantageous since the noise in the image decreases with

√�νTobs. The largestangle of incidence equals half the field of view, i.e. λ/2D, with D the single dish diameter.At this angle, the coherence length compliant with the largest baseline needs to obey:

Lcoh λ

2DLmax (3.88)

This translates in the following condition for the allowable frequency bandwidth:

�νν0

� 2DLmax

(3.89)

In the case of the WRST the ratio 2D/Lmax ≈ 1/50. At a wavelength of 21 centimeters (≈1400 MHz), this yields �ν � 28 MHz. This corresponds to a coherence length of more than10 meters. In practise a value of �ν ≈ 10 MHz is chosen, which corresponds to a coherencelength of about 30 meters.If one wishes to increase the bandwidth in order to improve sensitivity, than division infrequency subbands is required. For instance, if one observes at 21 centimeters and �ν = 80MHz is required, this bandwidth is subdivided into eight 10 MHz subbands by filters. Theindividual subband maps are subsequently scaled (with λ) and added. This then yields therequired signal to noise ratio.

3.4.5 The general case: arbitrary baselines and a field of view in arbitrarydirection

In deriving the Van Cittert-Zernike relations, we took the baseline vector �r(Y,Z) in a planeperpendicular to the vector pointing at the centre of the radiation source. In the WSRT casewe were, therefore, limited to consider a field of view near the north polar direction.Consider now an extended source in a plane σ in an arbitrary direction, an observation plane Σparallel to σ, the centre of the radiation source is located on the X-axis, which is perpendicularto both planes. We designate the intersection point of the X-axis with the Σ-plane as theposition of antenna 1 (for convenience, as we did before) and position antenna 2 at arbitrarycoordinates (X,Y,Z). This defines an arbitrary baseline vector �r(X,Y,Z). During the earthrotation the antenna beams are kept pointed at the source direction �Ω, i.e. this vector does notmove, however the tip of the baseline vector �r(X,Y,Z) describes a trajectory X(t), Y (t), Z(t)in space. Consider again radiation incident on the Σ-plane parallel to the X-axis, like in thecase of the original Van Cittert-Zernike derivation. In that case the path difference at positions1 and 2 was zero, since the baseline was located in the Y,Z-plane, now the path difference isgiven by X(t). To restore maximum coherence between positions 1 and 2, this path differencecan be compensated by delaying the signal of antenna 2 by X(t)/c, in other words radiationfrom direction X arrives at the same time at both inputs of the correlator. Taking the centreof the source as the zero reference point for the source direction vector in the X,Y,Z referenceframe, we can select an infinitesimal source element in the direction �Ω0(ΩX0,ΩY 0,ΩZ0) (unitvector) =⇒ I0δ(�Ω − �Ω0). The path difference at the input of the correlator is the projectionof the baseline on that unit vector under subtraction of the compensation X(t):

�r.�Ω0 −X = [ΩY 0Y + ΩZ0Z + (ΩX0 − 1)X] (3.90)

If the extent of the radiation source (or the field of view defined by a single antenna beam)is sufficiently small, the value of the direction cosine ΩX0 ≈ 1, i.e. (ΩX0 − 1)X � λ. In

62

that case the difference in path length, like in the case of the original VanCittert-Zernikederivation, equals the scalar product of two vectors in the Y Z-plane: the path differenceequals ≈ �rp.�Ω′

0 = ΩY 0Y + ΩZ0Z, with �rp the projection of the baseline vector on the Y Z-plane and �Ω′

0 the projection of the unit direction vector on the Y Z-plane. If I(�Ω) representsthe brightness distribution over the sky, the VanCittert-Zernike relation for obtaining thecoherence function is now given by:

Γ(�rp) =∫ ∫

source

I0(�Ω)e2πiΩ.rp/λd�Ω (3.91)

with �rp the projected baseline!Conclusion: Arbitrary baselines do not change the aperture synthesis technique, except thatthe 3D baselines are projected on the observation plane Σ and become 2D.Note: the pathlength difference X(t) changes continually during the rotation of the earthand, hence, needs continues electronic compensation with an accuracy of a small fractionof the wavelength λ. In practise, for the WSRT, this is accomplished in two steps: coarsecompensation with a delay line and subsequent fine tuning with the aid of an electronic phaserotator.At the beginning of the description of earth rotation aperture synthesis, we considered theradiation source to be located in the direction of the rotation axis. Suppose now that theextended radiation source (or the single telescope field of view) is located along a directionvector that makes an angle φ0 with the rotation axis of the earth. This is then the directionof the X-axis, perpendicular to this axis is the Y Z-plane. The East-West oriented WSRTbaselines physically rotate in a plane perpendicular to the earth axis, producing concentriccircles as described earlier. These concentric circles now need to be projected on the Y Z-plane, i.e. the observation plane Σ perpendicular to the viewing direction of the centre of thesource (or the centre of the field of view). The circles change into ellipses and the coherencefunction is now sampled on ellipses, rather then on circles. The major axes of these ellipsesremain equal to the physical length of the WSRT baselines, the minor axes are shortened bycosφ0. This causes the point spread function (PSF) to become elliptical as well, the angularresolution therefore reduces with the lowering of the declination [(π/2) − φ0]. As a result,celestial directions along the declination are subject to a broadening of the central peak ofthe PSF according to:

PSF =αλ

2Lmax cosφ0(3.92)

Apart from the central peak, the grating lobes are scaled accordingly, this can be clearly seenin figures (3.16) and (3.18). For the WSRT the most extreme case of baseline shorteningwould occur with a source in the equatorial plane (declination 0), no resolution would be leftin one direction. To circumvent this problem, baselines need to contain always North-Southcomponents. This is for example the case with the US VLA (Very Large Array) aperturesynthesis telescope.

63

Chapter 4

Radiation sensing: a selection

4.1 General

In modern astrophysics a host of radiation sensors is employed to cover the full extent ofthe electromagnetic spectrum, the different species in cosmic-ray particle beams and theinteraction products of cosmic neutrino capture. In this course it is not possible, nor desirable,to strive for any degree of completeness on this subject. Rather, a few specific sensor types havebeen selected since they feature very prominently in state of the art telescope systems used inpresent day astrophysical research. No attempt is made to describe these sensor types in anytechnical depth, nor will the, often very complex, electronic processing and data managementchain be described. The focus will be entirely on a basic understanding of their workingprinciple and physical characteristics, so as to allow a proper insight in their applicability andlimitations. In the following sections, a treatment is given of three generically largely differentsensor types:

• heterodyne receivers for coherent detection in the radio, microwave and far-infrareddomain

• photoconductors for sensing primarily in the visible and infrared bands

• charge coupled devices (CCD) for sensing in the optical, UV and X-ray domain.

4.2 Heterodyne detection

In radio and microwave astronomy, the signal power is often a very small fraction of the systemnoise. Hence, large values of the product Tobs · Δν are required to obtain the proper value ofthe limiting sensitivity for the detection of a weak source (see OAF1, chapter 5). As a conse-quence, the receivers employed must be very stable and well calibrated. Stable amplifiers areavailable at relatively low radio frequencies (i.e. up to 10 GHz), but large amplification factorsand stability do become a problem at high frequencies. In order to overcome this problem,extensive use is made in radio and sub-millimeter astronomy of heterodyne techniques.

In this technique, the incoming radio signal is mixed with a reference signal at a frequencyrelatively close to the average signal frequency νs. This reference signal at frequency νl isproduced by a so-called local oscillator with high frequency stability, i.e. Δνl is almost a δ-function. In that case the reference signal has a very large coherence time and may thereforebe regarded as coherent in the mixing process. Both signals are combined and fed to a non-linear detection element, the mixer element, which has the effect to produce signal power at

64

Figure 4.1: Processing chain

both sum νs + νl and difference |νs − νl | frequencies (recall: νs = 12π

ddt(2πνst+ φ(t))). This

can be quantitatively described by considering the followingThe wave signal received at the focus of the telescope is first passed through a horn

and a resonance cavity, which select the polarisation direction and the frequency bandwidthΔν. Subsequently the signal of the (tunable) local oscillator, at frequency νl, is coupled(superimposed) to the high frequency source signal νs through a so-called diplexer (or coupler)and funneled onto the non-linear mixer element. As discussed previously, a thermal sourcesignal can be expressed as

Es(t) =|Es,0(t) | ·ei(2πνst+φ(t)) (4.1)

Similarly, the signal of the local oscillator can be expressed as

El(t) = El,0(t) · ei(2πνlt) (4.2)

in which El,0(t) represents the phasor of the local oscillator signal. Regarding the localoscillator as a coherent signal source, this phasor can be written as

El,0(t) =|El,0 | ·eiψl (4.3)

Hence:El(t) =|El,0 | ·ei(2πνlt+ψl) (4.4)

Since the source signal concerns one component of polarisation, the wave signal at the non-linear mixer element can be written as

EM (t) =|Es,0(t) | · cos(2πνst+ φ(t))+ |El,0 | · cos(2πνlt+ ψl) (4.5)

Detection by the mixer element causes a frequency conversion through the occurrence ofa product component arising from the non-linear response. Assuming a non-linear mixerresponse proportional to E2

M (t), a product component 2 | Es,0(t) | · | El,0 | · cos(2πνst+ φ(t)) ·cos(2πνlt + ψl) is formed. Applying the trigonometric relation cosα cos β = 1

2 [cos(α + β) +cos(α− β)] two frequency components can be distinguished:

• one centered at νs + νl: |Es,0(t) | · |El,0 | · cos(2π(νs + νl)t+ φ(t) + ψl)

• one centered at | νs − νl |: |Es,0(t) | · |El,0 | · cos(2π(νs − νl)t+ φ(t) + ψl)

The latter expression is linearly proportional to the amplitude of the source signal | Es,0(t) |and possesses all the characteristics of the original signal, but now centers at a much lowerdifference frequency | νs−νl | (except for the phase shift ψl). In the sub-millimeter applicationwith νs ≈ 1 THz, | νs−νl | is of the order 10 GHz. Remember that the bandwidth of the source

65

Figure 4.2: Various stages in the detection process

signal Δνs is contained in the time dependent phase factor φ(t) and, consequently, the mixerelement should have adequate wide frequency response to cover the complete bandwidth.

The converted signal at the difference or intermediate frequency (IF) can be selected byemploying an intermediate frequency filter (IFF) and can subsequently be amplified by a stablelow noise IF amplifier (A). Obviously the mixer element should have the lowest possible noisetemperature, near-quantum limited operation near 1000 GHz has been achieved (see further).

By employing a tunable local oscillator the average frequency νs of the source signal canbe shifted while maintaining a fixed νIF and associated filter circuitry, i.e.:

νs = νl (tunable) + νIF (fixed) (4.6)

The IF signal is finally fed into a non-linear detection element (D) which, after low-passfiltering (LPF), generates the frequency spectrum associated with the signal phasor for one

66

component of polarisation | Es,0(t) | cosφ(t), i.e. the envelope function of the high frequencysource signal Es(t). This frequency spectrum represents the spectral distribution of the radiosignal centered at νs over the bandwidth Δνs. This signal can be fed into a frequency analyser(spectrometer) for spectral analysis. Figure 4.2 shows various stages in the detection process.Panel A displays the high frequency thermal signal incident on the telescope (antenna). PanelB shows the output of the resonance cavity into the diplexer, limited to the bandwidth Δνs.Panel C shows the envelope function superimposed on the IF-carrier signal νIF =| νs − νl |,after frequency conversion. In panel D this signal is fed through the non-linear detectionelement (normally quadratic response) after which the low frequency phasor signal can befiltered out (panel E).

Heterodyne detection at sub-millimeter and infrared wavelengths is a powerful spectro-scopic technique. In this case the mixer element consists of a superconductive tunnel junction,comprising two superconducting electrodes separated by a thin oxide barrier. Because of anenergy gap 2Δ in the material of the superconducting electrodes, the current-voltage char-acteristics (so-called I-V diagram) of this Superconductor Insulator Superconductor (SIS)junction is highly non-linear. SIS-mixers are the most sensitive heterodyne mixers in the0.2–1.2 THz frequency range, where the upper frequency is determined by the energy gap 2Δof practical superconducting materials (e.g. Niobium (Nb) and Niobium-Titanium (NbTi)).The frequency down-conversion process can be achieved with near quantum-limited noise per-formance. Up to 700 GHz a noise temperature of a few times the quantum limit has beenobtained, the mixer operates at the quantum limit, but input losses and IF amplifier noisedecreases the overall performance. Above 1.2 THz hot-electron bolometer mixers (HEBM)are used which are not limited by the energy bandgap of the superconductor since they donot operate on the individual photon energy but on the total absorbed power. This will notbe elaborated any further in this course.

Note: The factor |El,0 | in the product component can be made large so that the signal tonoise ratio can be largely improved with respect to the internal noise sources in the receiverchain. This assumes of course that |El,0 | is ultra stable and, also, that the quantum noise ofthe local oscillator is still negligible.

4.3 Frequency limited noise in the thermal limit: signal tonoise ratio and limiting sensitivity

The average spectral noise power P (ν) in the thermal limit hν � kT , for one degree ofpolarization, equals kT . This is the situation which prevails in radio-astronomy and hencegives rise to a description of signals and noise with the aid of characteristic temperatures, e.g.source temperature Ts and noise temperature Tn. The power spectral density S(ν) ≡ P (ν) =kT Watt Hz−1 is constant as a function of frequency and is therefore called white noise. Inpractice, due to the finite frequency response of any receiver system, this will be frequencylimited, i.e.:

S(ν) = kTΠ(ν

2νc

)with νc � kT

h(4.7)

with νc the cut-off frequency of the receiver under consideration, i.e. the total energy containedin the power spectrum remains finite, as it should be for any physical system.

Intermezzo: thermal noise in a resistorA resistor in equilibrium with a surrounding thermal bath at temperature T will exhibit

random fluctuations in voltage between its terminals, from thermal excitations of free con-duction electrons (or electron-hole pairs in a semi-conductor). The power spectral density of

67

Figure 4.3: Illustration of the circuit re-placing a noisy resistor R: the circuitcomprises a noise generator in series witha noise-free resistor R. The load resistorRL for power extraction is also indicated.

this noise voltage, expressed in Volt2Hz−1, is related to S(ν) according to the expression:

SV (ν)R

= 2S(ν) = 2kTΠ(ν

2νc

)or SV (ν) = 2kTRΠ

(ν

2νc

)(4.8)

The factor 2 arises from the assumption of unpolarized radiation. The noise voltage associatedwith SV (ν) is a random variable with a normal distribution, average ηV = 0 and variance σ2

V .The value of σ2

V = ΔV 2 follows from integration of SV (ν) over all frequencies:

ΔV 2 =+∞∫

−∞2kTRΠ

(ν

2νc

)dν =

+νc∫−νc

2kTRdν = 4kTRνc (4.9)

This equation for ΔV 2 is commonly referred to as Nyquist’s formula for thermal noise, SV (ν)is a real and even function. In the real physical world we deal only with positive frequencies,it is then not unusual that SV (ν) is replaced by the so-called one-sided power spectral densityS∗V (ν) = 2SV (ν) = 4kTRWattHz−1. ΔV 2 is then of course determined from integration over

all physical frequencies, i.e. ΔV 2 =∫∞0 S∗

V (ν)dν.The influence of thermal voltage noise due to a noisy resistor is taken into account by

replacing the resistor by a noise-free resistor in series with a noise-voltage source (ΔV 2)1/2 =(4kTR)1/2 VoltHz−1/2. It is important to realize that in this case the actual available noisepower is obtained by optimal power adaptation: see figure 4.3. One can easily derive thatthe optimal power adaptation is obtained by selecting RL = R. The load resistor RL isalso noisy and delivers the same noise power back to R as long as the temperatures of bothresistors are equal. If the resistor temperatures are not equal, then the hotter resistor deliverspower to the coldest resistor until both resistors have the same temperature (thermodynamicalequilibrium). The available noise power follows from

PR(ν) =V 2o

RL=

[12(4kTR)1/2

]2R

= kT Watt Hz−1 (4.10)

independent of the specific value of the resistance R.end intermezzo

Continuing now with the detection of radio signals, we wish to consider the signal-to-noiseratio. As discussed previously, considering one degree of polarization, we can write:

S(t) = a(t) = Re(|Eo(t)| · ei(2πνt+φ(t))

)(4.11)

Detection of such a radio signal requires, like in the case of mixing, a non-linear operation likey(t) ≡ |a(t)| or y(t) = a2(t), to extract the power present in the signal. We shall consider here

68

the case of quadratic detection, i.e. y(t) = a2(t), since this is mathematically straightforwardin contrast to absolute-value transformations.

If a(t) can be described as a stationary random process with a normally distributed am-plitude around zero mean, the probability density function is given by:

f(a) =1

σa√

2πe−a

2/2σ2a (μa = 0) (4.12)

and the measuring process is schematically indicated by

a(t) → transformation → y(t) ≡ a2(t) (4.13)

For the transformation y(t) = a2(t), with σ2a = Ra(0), we can write the probability density

of y as

f(y) =1

2[2πRa(0)y]1/2exp

[ −y2Ra(0)

]H(y) (4.14)

where H(y) is the Heaviside step function: H(y) = 0, y < 0;H(y) = 1, y ≥ 0. (In understand-ing expression (4.14), don’t forget the transformation da→ dy!) Thus, the stochastic processy(t) is apparently not normally distributed, and of course also μy �= 0.

To derive the power spectral density of y(t) we need to find an expression for the auto-correlation function Ry(τ) of y(t). One can show that for a normally distributed a(t) theautocorrelation of y(t) ≡ a2(t) follows from:

Ry(τ) = E {y(t)y(t+ τ)} = E{a2(t)a2(t+ τ)

}= E

{a2(t)

}E{a2(t+ τ)

}+2E2 {a(t)a(t+ τ)}

(4.15)(For the derivation, which makes use of so-called moment-generating functions of a(t), seePapoulis, p. 160.) Hence:

Ry(τ) = R2a(0) + 2R2

a(τ) = μ2y + Cy(τ) (4.16)

The average μy of y(t) equals the variance of a(t), the autocovariance of y(t) equals twice thesquare of the autocovariance of a(t), and the variance of y(t) is σ2

y = 2σ4a. The power spectral

density of y(t) follows from the Wiener Khinchin theorem:

Sy(ν) = R2a(0)δ(ν) + 2Sa(ν) ∗ Sa(ν) (4.17)

IntermezzoIt is important to realize that in the case of quadratic detection a number of frequency

components is introduced in the case of an amplitude-modulated signal like a(t). We shalldemonstrate this, as an example, for a deterministic signal, i.e. an amplitude-modulated highfrequency carrier of the form:

x(t) = A(1 +m cos ηt) cosωt (4.18)

This represents a high frequency carrier (ω) the amplitude (A) of which is modulated bya much lower frequency (η). m is called the modulation index. The signal x(t) containsthree discrete frequencies, ω − η, ω, and ω + η. This is easily seen by using cosα cos β =12 [cos(α+ β) + cos(α− β)].

The instantaneous intensity of this signal can be expressed as I = x2(t), and for theaverage intensity we get

I = x2(t) =12A2(1 +

12m2) (4.19)

69

Figure 4.4: Schematic view of a radio telescope with the receiving horne, which selects thefrequency νs and bandwidth Δνs, the wave pipe, and the detector diode (left). By mixing theradio signal with that of a local oscillator (LO), the frequency can be shifted to much lowerfrequencies, suitable for electronic processing (right).

The equivalent of the term E{a2(t)

}E{a2(t+ τ)

}in expression (4.15) amounts in this case

to a DC-term representing the square of the average intensity:

I2 =

14A4(

1 +12m2)2

(4.20)

To arrive at the equivalent of the autocovariance term 2E2 {a(t)a(t+ τ)} in equation (4.15),we first have to compute the autocovariance Cx(τ) of x(t):

Cx(τ) = x(t)x(t+ τ) =12A2[cosωτ +

14m2 cos(ω + η)τ +

14m2 cos(ω − η)τ ] (4.21)

Subsequently:

x(t)x(t+ τ)2

=14A4(1 +

12m2 cos ητ)2 cos2 ωτ (4.22)

Using the relation cos2 α = 12(cos 2α+ 1) this can be disentangled in various components (try

it yourself!):

x(t)x(t+ τ)2

=A4

8(1 + cos 2ωτ) +

m2A4

8[cos ητ +

12

cos(2ω − η)τ +12

cos(2ω + η)τ ]

+m4A4

64[1 + cos 2ητ +

12

cos(2ω − 2η)τ + cos 2ωτ +12

cos(2ω + 2η)τ ] (4.23)

This expression shows that the original frequencies in the power spectral density of x(t), ωand ω ± η have been transformed in the squaring process to frequency components at DC, η,2η, 2ω, 2ω ± η and 2ω ± 2η. Displayed on a two-sided frequency diagram this shows twicethe original frequency bandwidth centered at DC and ±2ω (see figure 4.5). In the case of thestochastic signal a(t) we shall see the same characteristics in the power spectral density of theautocovariance Cy(τ).

end intermezzoConsider now the front-end of a receiver behind a radiotelescope (figure 4.4). This frontend

generally contains a resonance cavity tuned at a central frequency ν = νs with a bandwidthΔνs. If the noise entering the receiver can be characterized by a noise temperature Tn thetwo-sided power spectral density of a(t) is given by:

Sa(ν) = kTn

[Π(ν − νsΔνs

)+ Π

(ν + νsΔνs

)](4.24)

70

Figure 4.5: Two-sided frequency spectrum of the radiation incident on the non-linear detectionelement (upper panel) and the two-sided power spectral density at the output of the detectionelement (lower panel).

This signal is then fed to a non-linear detection element, a diode or an induction coil, whichintroduces the transformation y(t) = a2(t). Consequently, we have

R2a(0) = (σ2

a)2 =

⎛⎝kTn

+∞∫−∞

[Π(ν − νsΔνs

)+ Π

(ν + νsΔνs

)]dν

⎞⎠

2

= (2kTnΔνs)2 (4.25)

and

2 [Sa(ν) ∗ Sa(ν)] = 2(kTn)2[Π(ν − νsΔνs

)+ Π

(ν + νsΔνs

)]∗[Π(ν − νsΔνs

)+ Π

(ν + νsΔνs

)]

= 2(kTn)2Δνs[2Λ(

ν

Δνs

)+ Λ

(ν − 2νs

Δνs

)+ Λ

(ν + 2νs

Δνs

)](4.26)

Sy(ν) consists therefore of a component (2kTnΔνs)2 at zero frequency and three Λ-functions centered at zero frequency and at frequencies −2νs and +2νs with a basewidth of2Δνs, as illustrated in figure 4.5. Note the correspondence in frequency shift and bandwidthwith the example in the intermezzo above.

In practice one always has νs Δνs, in the centimeter range for example one typicallyhas νs 1010 Hz and Δνs 107 Hz. The detection of a potential radio source signal will thenhave to be assessed in the context of the autocovariance of the noise signal, i.e. we need anexpression for Cy(τ). This follows from the Fourier Transform of 2 [Sa(ν) ∗ Sa(ν)] as given inequation (4.26), and thus:

Cy(τ) = 2(kTn)2Δν2s

[e−2πi(2νsτ) + e2πi(2νsτ)

]sinc2τΔνs + (2kTn)2Δν2

s sinc2τΔνs

= (2kTnΔνs)2(1 + cos 2π(2νsτ))sinc2τΔνs (4.27)

71

Figure 4.6: The autocorrelation function (left) and the autocovariance (right) of a quadraticallydetected radio signal ( see equations 4.16, 4.25, and 4.27).

The cos 2π(2νsτ) term refers to the high frequency carrier which will be filtered off in any lowfrequency averaging process.

This averaging process can be taken over an arbitrary time interval ΔT . This is equivalentto filtering in the frequency domain with a filter Π(f/2fc) with fc = 1/ΔT . The averagedvalue of the autocovariance is then obtained in the τ domain by convolution of Cy(τ) with2fcsinc2fcτ . Assuming fc � Δνs � νs, the cos 2π(2νsτ) term in expression (4.27) averagesto zero. Consequently we have

[Cy(τ)]ΔT = (2kTnΔνs)2sinc2τΔνs ∗ 2fcsinc2fcτ

= (2kTnΔνs)22fc

+∞∫−∞

sinc2τ ′Δνssinc2fc(τ − τ ′)dτ ′ (4.28)

and with a change of variables u′ ≡ τ ′Δνs this becomes:

[Cy(u)]ΔT = 2(2kTn)2Δνsfc

+∞∫−∞

sinc2u′sinc2fcΔνs

(u− u′)du′ (4.29)

Since fc/Δνs � 1, sinc(2fc/Δνs)u′ varies very slowly compared to sinc2u′. We may thereforeregard sinc2u′ as a δ-function in comparison to sinc(2fc/Δνs)u′. Moreover we also have theproper normalization, since

∫∞−∞ sinc2u′du′ =

∫∞−∞ δ(u′)du′ = 1. Applying this approximation

we get:

[Cy(u)]ΔT = 2(2kTn)2Δνsfc

+∞∫−∞

δ(u′)sinc2fcΔνs

(u− u′)du′ = 2(2kTn)2Δνsfcsinc2fcΔνs

u (4.30)

Substituting τ = u/Δνs we arrive at the final expression for the ΔT -averaged value of thenoise autocovariance:

[Cy(τ)]ΔT = 2(2kTn)2Δνsfcsinc2fcτ (4.31)

The noise variance [Cy(0)]ΔT = 2(2kTn)2Δνsfc should be compared to the strength of aradio source signal characterized by a source temperature Ts. The average value of this sourcesignal after quadratic detection follows from

(μy)s = Ras(0) = σ2as

= 2kTsΔνs (4.32)

72

and the signal-to-noise ratio thus becomes:

S/N =(μy)s

[Cy(0)]1/2=TsTn

(Δνs2fc

)1/2

(4.33)

i.e. the signal to noise is proportional to the square root of the bandwidth Δνs and inverselyproportional to the bandwidth of the integrating (averaging) low-pass filter fc. IntroducingΔTav = 1/(2fc) we get

S/N =TsTn

(ΔνsΔTav)1/2 (4.34)

i.e. the S/N ratio improves with the square root of the product of the radiochannel bandwidthand the integration time ΔTav. The minimum detectable source power for S/N=1 is given by

(PS)min = 2kTn (ΔνsΔTav)−1/2 (4.35)

4.4 Photoconductive detection

4.4.1 Operation principle

A photoconductor exhibits a change in conductance (resistance) when radiant energy (pho-tons) is incident upon it. The radiant energy increases the conductance (σ0) by producingexcess charge carriers in the photoconductor, which is made from semiconductor materials.The charge carriers comprise electron-hole pairs in intrinsic semiconductors. In extrinsicsemiconductors the excess charge carriers comprise either electrons (n-type) or holes (p-type).The spectral responsivity of a particular photoconductor is determined by its energy gap, onlyphotons that have energies greater than the gap energy will be absorbed and cause an excesscurrent flow, i.e. a photoconductor is operated in a mode in which an applied electric field(through a voltage bias) produces a current that is modulated by the excess charge carriersproduced by photon-excitation.

4.4.2 Responsivity of a photoconductor

To put the above discussion of the operation principle on a proper physical basis, the responseto a radiation signal will be derived in this section.

A material with conductivity σ0 produces a current density �j given by

�j = σ0�E (4.36)

in which �E represents the electric field strength generated by a bias voltage VB accross thephotoconductor (| �E | in Volt·m−1 and σ0 in Ohm−1·m−1).

Microscopically the current density �j can also be expressed as

�j = Nq�v (4.37)

in which N represents the volume density of the free charge carriers, q the elementary chargeand �v the drift velocity of these charges in the applied electric field. This drift velocity �vcan also be written as �v = μc �E, with μ the so-called mobility of the charge carrier. For anintrinsic semiconductor a distinction needs to be made between electron conduction and holeconduction: the mobilities μn for electrons and μp for holes are quite different (μn ≈ 3μp):

�j = −nq �vn + pq �vp (4.38)

73

Figure 4.7: Sketch of detector geometry.

with n/p the electron/hole densities and �vn/�vp the electron/hole drift velocities (oppositedirections). Note that q is the elementary charge and has a positive sign.

Combining equations (4.36) and (4.38), an expression for the conductivity σ0 follows:

σ0 = q(nμn + pμp) (4.39)

which reduces to σ0 = qnμn and σ0 = qpμp in the case of a heavily doped n-type, respectivelyp-type extrinsic semiconductor.

Consider now a n-type semiconductor, which is irradiated by a beam of radiant energywith a spectral photon irradiance (= monochromatic photon flux density) F (λ0). In theequilibrium situation the number of excess conduction electrons follows from the fact that thegeneration rate should match the recombination rate:

dΔndt

= g − Δnτl

= 0 (4.40)

with Δn the equilibrium number of excess electrons per unit volume (= excess carrier con-centration), τl the life time of these electrons against recombination and g the generationrate:

g =ηλ0F (λ0)

d(4.41)

with ηλ0 the photon detection efficiency and d the thickness of the photoconductor material.The increase in conductivity Δσ = σ − σ0 follows from:

Δσ = qμnΔn =qμnηλ0F (λ0)τl

d=qμnηλ0τlAd

λ0

hcΦ(λ0) (4.42)

in which Φ(λ0) the monochromatic radiation flux in Watt, A the illuminated area of thephotoconductor and λ0 the wavelength under consideration.

Given a fixed bias voltage VB across the photoconductors, the relative change in con-ductivity Δσ/σ can be related to a relative change in current ΔI/I0 and resistance ΔR/R0

as:Δσ0

σ0= −ΔR

R0=

ΔII0

=IpcI0

(4.43)

74

Figure 4.8: Ideal responsivity of a photoconductor.

in which I0, R0 represent the photoconductor DC-current/resistance in the absence of radia-tion and ΔI = Ipc the photon-generated current (photo-current).

Introducing the detector widthW and length l (A = lW , see figure 4.7) I0 can be expressedas

I0 = Wd |�j | (4.44)

and hence:Ipc = I0

Δσ0

σ0=ηλ0qλ0

hc· τlμnVB

l2· Φ(λ0) (4.45)

The term τlμnVB/l2 is commonly referred to as the photoconductive gain G. Introducing

the transition time τtr of the free charge carriers accross the photoconductor length l, i.e.τtr = l2

μnVB, G gives the ratio between the carrier life time against recombination in the

photoconductor and its transition time, i.e. G = τl/τtr.The current responsivity RIpc(λ0) follows from:

RIpc(λ0) =Ipc

Φ(λ0)= Gηq

λ0

hc(4.46)

in Ampere/Watt. Figure 4.8 shows the ideal responsivity as a function of wavelength forconstant ηλ0 .

Problem: In practice the photocurrent Ipc is measured over a load resistance RL in serieswith the photoconductor resistance R0, translating Ipc in an equivalent voltage V0, see figure4.9. Show that in this case the spectral voltage responsivity is given by:

RVpc(λ0) =RLR0

RL +R0Gηλ0q

λ0

hc(4.47)

in Volt/Watt. �The above expressions are only valid for low-frequency operation of the photoconductor,

where dielectric relaxation or recombination life-time effects are not of concern. In the nextsection, the frequency response of photoconductors will be addressed.

In order to raise the responsivity of a photoconductor, one should

• enlarge the quantum efficiency ηλ0 by minimizing reflection effects at the entrance planethrough application of anti-reflection coatings and by creating a larger cross-section forthe internal photo-electric effect.

• increase the carrier life time, which raises the photoconductive gain G.

• enlarge the carrier mobility μc (recall that n-type charge carriers have substantiallyhigher mobility than p-type charge carriers).

75

Figure 4.9: Photoconductor: bias circuit

• increase the operational bias voltage VB, which lowers the value of the transition timeτtr in the expression for the photoconductive gain G.

4.4.3 Temporal frequency response of a photoconductor

Consider a step-function in charge carrier generation due to a switch-on of the photoconductorexposure to a radiation source. The response of the photoconductor is now obtained fromsolving the time dependent continuity equation:

dΔndt

= g − Δnτl

(4.48)

with boundary condition Δn = 0 at t = 0. The solution of this differential equation isstraightforward:

Δn = gτl(1 − e− t

τl ) (4.49)

The time dependent excess carrier concentration can also be expressed as the convolution ofa charge generation step-function, gU(t) with the photoconductor response function hpc(t):

Δn(t) = gU(t) ∗ hpc(t) (4.50)

Fourier (Laplace) transform for t ≥ 0 to the temporal frequency domain gives:

Δn(f) =g

2πif·Hpc(f) (4.51)

Substituting expression (4.49) in Δn(t) ⇔ Δn(f) yields the photoconductor response functionfor a dynamical input signal:

gτl2πif

11 + 2πifτl

=g

2πifHpc(f) =⇒ Hpc(f) =

τl1 + 2πifτl

(4.52)

76

Figure 4.10: For a block function x(t) of illuminating photons (top), the number of chargesn(t) in a photoconductor increases on a finite time scale towards the equilibrium value, anddrops exponentially once the illumination ceases (middle). The impulse response function h(t)is an exponential (below).

and hence the frequency dependent amplitude transfer:

|Hpc(f) | =τl√

1 + 4π2f2τ2l

(4.53)

This expression shows that the photoconductor acts as a frequency filter that takes theform of a first order system (see figure 4.10). At a frequency f = 1/(2πτl), the responsivityreduces with 3 dB (= 1/

√2). For 2πfτl 1, the responsivity rolls off with 6 dB per octave; i.e.

a reduction of a factor 2 by each doubling of the frequency. The cut-off frequency fc = 1/(2πτl)determines the characteristic bandwidth of a photoconductor with a charge carrier life timeτl. This breakpoint in the frequency domain is typically around 1 MHz, which of course variesfor various detector types. Typical response times of photoconductive radiation sensors are,therefore, intrinsically limited to microsecond time scales.

4.5 Frequency limited noise in the quantum limit

4.5.1 Frequency limited shot noise in the signal-photon limit

Let us first consider the general case of frequency limited noise in the quantum limit: so-called band limited shot noise. An observation in the quantum limit contains noise in boththe photon signal and in the (photon) background. Let us now assess a signal with nb photonsper second. In Chapter 1 we derived expressions for the expectation value and autocorrelationof an unfiltered Poisson process of photon arrival with an average flux of nb photons per second(see equations (1.86 and(1.87), repeated here:

E{Y (t)} = nb and RY (τ) = n2b + nbδ(τ) (4.54)

77

By applying the Wiener Khinchin theorem we compute the power spectral density:

RY (τ) ⇔ SY (s) =+∞∫

−∞RY (τ)e−2πisτdτ = n2

bδ(s) + nb (4.55)

which we cannot accept in this form, as it would lead to an infinitely high power signal. Inpractice there is always a frequency cut-off at, say, sc.

To describe this, we can represent the filtering process as:

Y (t) → h(t) → Z(t) (4.56)

where h(t) is the instrumental response, described by a convolution of Y (t) with a responsefunction h(t). We will consider a linear dynamic system:

Z(t) = h(t) ∗ Y (t) =+∞∫

−∞

∑k

δ(t′ − tk)h(t− t′)dt′ =∑k

h(t− tk) (4.57)

in which h(t) is the impulse response of the system. (In the literature you may also encounterdescriptions in terms of the integrated impulse response function U(t) ≡ ∫

h(t)dt.) For theexpectation value of Z we thus find

E{Z(t)} = E{+∞∫

−∞Y (t− t′)h(t′)dt′} =

+∞∫−∞

E{Y (t− t′)} · h(t′)dt′ = nb

+∞∫−∞

h(t′)dt′ = nbH(0)

(4.58)where for the last transition we have used:

H(s) ≡+∞∫

−∞h(t′)e−2πist′dt′ ⇒ H(0) =

+∞∫−∞

h(t′)dt′ (4.59)

(Note that h(t) is not a normalized function.) In the Fourier domain we write

SZ(s) = |H(s)|2 · SY (s) = n2b |H(s)|2 δ(s) + nb|H(s)|2 = n2

bH2(0)δ(s) + nb|H(s)|2 (4.60)

and now the power is finite, as it should be. We obtain the autocorrelation by taking theFourier transform of the spectral power density SZ(s):

RZ(τ) = n2bH

2(0) + nb [h(τ) ∗ h(τ)] (4.61)

where the first term on the right hand side gives the mean charge response of the dynamicsystem, and the second term the noise. Taking the covariance at τ = 0 we write

CZ(τ = 0) = nb [h(0) ∗ h(0)] = nb

+∞∫−∞

h2(t)dt = nb

+∞∫−∞

H2(s)ds (4.62)

where the last step is taken by applying the Parseval theorem.

78

4.5.2 Example: Shot noise in the signal-photon limit for a photoconductor

To illustrate the meaning of these equations we will now apply them to the case of a photo-conducting device. The average number of charge carriers generated by a radiation beam withan average monochromatic photon flux density F (λ0) ( average spectral photon irradiance) atwavelength λ0 equals Gηλ0F (λ0)Apc, in which Apc represents the area of the photoconductorand G the photoconductive gain. The average photocurrent E{i(t)} is therefore expressed as:

E{i(t)} = i(t) = qGηλ0F (λ0)Apc = qGnc (4.63)

with q the elementary charge. The frequency response can be expressed as (see equation(4.52)):

Hpc(f) =τl

1 + 2πifτl(4.64)

where τl is the charge carrier life time. Equation (4.60) now becomes

S(f) = q2G2n2cδ(f) +

q2G2nc1 + 4π2f2τ2

l

(4.65)

Note: The quantities q and G are deterministic multiplication constants and belong in thatsense to the transfer function term |H(s)|2 in equation (4.60), hence they appear squared inthe last term of equation (4.65) ! Taking the inverse Fourier transform of this last term inequation (4.65) we obtain the covariance of the photo-current i(t):

Ci(t)(τ) = q2G2 nc2τl

e−|τ |/τl (4.66)

and we see that there is no correlation for τ τl, but that the signals have a mutual effecton one another for τ ≤ τl. For τ = 0 we find the variance in the photocurrent:

σ2i(t) = Ci(t)(0) = q2G2 nc

2τl= q2G2πncfl (4.67)

by substituting fl = 1/(2πτl). The standard deviation of the photoconductor current distri-bution can thus be expressed as (substituting nc):

σi(t) = qG√πηλ0F (λ0)Apcfl (4.68)

In practise the fluctuations in the photo-current are higher since, besides the charge generationnoise, also recombination noise contributes to the fluctuation level. The standard deviationin the photo-curent due to the combined generation and recombination noise (so-called G-Rnoise) is a factor

√2 higher, since both the generation and recombination processes are subject

to shot noise:σG−R = qG

√ηλ0F (λ0)Apcωl (4.69)

with ωl = 2πfl. Hence, for the signal to noise ratio in the photon limit we obtain

SNR =

(i(t)σG−R

)=(ηλ0F (λ0)Apc

ωl

)1/2

(4.70)

This last equation tells us that a high value for the frequency cutoff fl leads to a lower signalto noise for the photo-current; the reason for this is that the intrinsic system noise is lessfiltered.

79

Figure 4.11: Typical detectivities for several photo-conductors.

4.5.3 Shot noise in the background-photon limit for a photoconductor: thenormalized detectivity D∗

The rms-noise in the photo-current, derived in the previous paragraph, dominates over thermalnoise if the photoconductor is sufficiently cooled.

From the expression for the standard deviation σG−R in the photo-current, given in equa-tion (4.69), we can derive the spectral Noise Equivalent Power (NEP (λ0, f)) if we assumethat this photo-current arises from an average monochromatic background-photon flux den-sity B(λ0) instead of an average signal-photon flux density F (λ0) considered above. ThisNEP (λ0, f) represents the so-called radiation Background LImited Performance (BLIP). Weuse the relation:

NEP(λ0, f) =σG−RRIpc(λ0)

=hc

λ0

(B(λ0)Apcωl

ηλ0

) 12

(4.71)

and the normalised detectivity D∗(λ0, f):

D∗(λ0, f) =√Apcfl

NEP(λ0, f)=

λ0

hc√

2π

(ηλ0

B(λ0)

) 12 ≈ λ0

2hc

(ηλ0

B(λ0)

) 12

(4.72)

Figure 4.11 shows the theoretical peak detectivity (η = 1) as a function of wavelength, assumingradiation-background limited (BLIP) performance arising from an omni-directional blackbodyradiation field at 300 K (i.e. integrated over an upper hemisphere of 180◦). Also included infigure 4.11 are some common extrinsic photo-conductors and their corresponding operatingtemperatures. Detectivities of two types of thermal detectors (bolometers) are displayed forcomparison. The thermal detectors lack the cut-off feature in wavelength due to the different

80

detection principle, they show a considerably lower value of D∗(λ0, f) but cover a widerspectral band than the photo-conductive devices.

4.6 Charge Coupled Devices

4.6.1 Operation principle

A charge-coupled device (CCD) in its simplest form constitutes a closely spaced array of metal-insulator-semiconductor (MIS) capacitors. The most important among the MIS elementsis the metal-oxide-semiconductor (MOS) capacitor, made from silicon (Si) and employingsilicon dioxide (SiO2) as the insulator. The charge position in the MOS-array of capacitors iselectrostatically controlled by voltage levels. Dynamical application of these voltage levels andtheir relative phases serves a dual purpose: injected charges (e.g. due to radiation generatedelectron-hole pairs) can be stored in the MOS capacitor and the built-up charge (i.e. chargepacket) can subsequently be transferred across the semiconductor substrate in a controlledmanner. Monolithic CCD arrays are available for imaging in the near-infrared, the visible andthe X-ray range, for the mid-infrared (up to 20 μm), hybrid fabrication is employed in whichthe infrared-sensitive detector material is sandwiched to a CCD array of cells for read-out.

In the following sections a short description is given of the storage and transfer mechanismsand of a few focal-plane CCD-architectures.

4.6.2 Charge storage in a CCD

Two basic types of charge coupled structures exist. In the one type, charge packets are storedvery close to the interface between the semi-conductor (Si) and the overlaying insulator (SiO2).Since the charges reside virtually at the surface of the semiconductor layer, these devices havebecome known as surface channel CCDs (SCCDs). In the other type, charge packets are storedat some distance away from the surface of the semiconductor, such devices have become knownas bulk or buried channel CCDs (BCCDs). From the user point of view both devices are verysimilar, therefore the discussion here is mostly limited to SCCDs since their concept is easierto understand.

Figure 4.12 shows a single CCD-electrode comprising a metal gate, separated by a thinoxide layer (few times 0.1 μm) from a p-type semiconductor (hole-conduction). Prior to theapplication of a voltage bias to the gate electrode, there will be a uniform distribution of holes(majority free charge carriers) in the p-type semiconductor. As the gate electrode is madepositive however, the holes are repelled immediately beneath the gate and a so-called depletionlayer (devoid of free charge) is created. Increasing the gate voltage causes the depletion regionto extend further into the semiconductor and the potential at the semiconductor/insulatorinterface (φS) becomes increasingly positive. Eventually, a gate voltage bias is reached atwhich the surface potential φS becomes so positive that electrons (i.e. the minority chargecarriers in the p-type material !) are attracted to the surface where they form an extremelythin(≈ 0.01 μm thick), but very dense (in terms of charge density) layer, a so-called inversionlayer. Basically the electrons reside in the deep potential well at the semiconductor surfaceand do not recombine with holes, since the latter are immediately repelled from the depletionlayer upon creation. Therefore, if radiation is incident on a single CCD electrode, the electronsfrom the produced electron-hole pairs will be stored in the inversion layer, the holes will berepelled from the depletion region. An analytical expression for the variation of the surfacepotential φS with the gate voltage VG and the surface-charge density QInv in the inversionlayer can be derived as follows. From figure 4.12 it can be seen, since the p-type substrate is

81

Figure 4.12: Single CCD-electrode. VG is positive, QInv and QD are negative.

grounded, thatVG = Vox + φS (4.73)

Charge neutrality prescribes:QG = −(QInv +QD) (4.74)

in which QD = −qNaxD represents the surface charge density in the depletion later (i.e.the volume charge density −qNa integrated over the thickness of the depletion layer xD).Na is the density of the acceptor doping in the p-type semiconductor, q is the elementarycharge. From the theory of depletion penetration, which basically involves integration of thePoisson equation in electrostatics, xD = 2εφs/qNa, with ε the dielectric constant (ε = ε0εr).Substituting this in QD and writing QG = VoxCox (Cox represents the oxide capacitance perunit area (Cox = ε/tox with tox the oxide thickness)), the expression for the surface potentialφS becomes after some manipulation:

φS = VG + V0 +QInvCox

−√

2(VG +

QInvCox

)V0 + V 2

0 (4.75)

V0 is a constant (qεNa/C2ox), with the dimension Volt. The value of φS as a function of VG

(with QInv = 0) and as a function of QInv (with VG = 10 and 15 V respectively) is displayedin figure 4.13 and figure 4.14 for a few thicknesses of the insulating oxide layer.

The surface potential φS can be interpreted as a potential well, the depth of which isdetermined by the magnitude of the inversion charge packet QInv (in analogy with a bucketpartially filled with fluid). From the figures it can be seen that φs is practically a linearfunction of QInv and VG, this arises from the fact that the constant V0 is relatively smallcompared to typical values for VG; i.e. V0 = 0.14 Volt for a 0.1 μm thick oxide layer. SinceVG = 10 - 15 Volt:

φS = VG +QInvCox

in good approximation (4.76)

Note: in the above computation the difference in workfunction between the metal and thesemiconductor has been neglected, i.e. the so-called flat-band condition has been assumed. In

82

Figure 4.13: Variations of the surface potential φS with the gate voltage VG for different valuesof the oxide thickness tox. The charge in the inversion layer QInv is zero in all cases. Figuretaken from Beynon & Lamb (1980).

practice this can always be achieved by adaptation of the potential VG. Moreover φS hasa minimum value related to the Fermi-level in the semi-conductor, this boundary conditionhas been omitted from the above formulae, since it is of no direct relevance to the storageprinciple.

As will be seen in the next section, storage of charge at the Si-SiO2 interface introducespotential losses of accumulated charge during charge transport, owing to charge trapping inthe atomic surface states. To prevent such losses the charge packet should preferably be keptat a potential minimum separate from the Si-SiO2 interface, i.e. a minimum in the bulk-siliconhas to be generated. This can be achieved by adding an n-type layer on top of the p-typesilicon. The ionised positive ions of the n-type top layer raise the positive gate voltage to anever higher channel potential inside the silicon. Figure 4.15 shows the comparison with theoriginal situation (electron-potential minimum at the interface). As illustrate, the potentialwell in the n-type Si has now a minimum (for electrons) deeper in the bulk Si. This is theprinciple of the BCCD, charge is now stored and transported in a channel embedded in thebulk silicon and is not anymore subject to charge loss at surface states. Figure 4.15 shows alsothe comparison between an empty and partly filled SCCD, (a) and (b), and an empty andpartly filled BCCD, (c) and (d). In the latter case the charge packet flattens the minimumof the potential well, this can be regarded as a “neutral layer” which increases in width if itbecomes filled with charge.

83

Figure 4.14: Variations of the surface potential φS as a function of the inversion charge QInvfor two values of the gate voltage VG and of the oxide thickness tox. Figure taken from Beynon& Lamb (1980).

4.6.3 Charge transport in a CCD

The way in which a charge packet is transported through a CCD structure (charge-coupling)is illustrated in figure 4.16. Assume the charge packet is stored initially in the potentialwell under the second electrode (from the left in the figure) which is biased to say 10 V. AllCCD electrodes need to be kept at a certain minimum bias (≈ 2 V) to ensure that each MOScapacitor operates in the inversion-mode. The potential well under the 10 V electrode is muchdeeper than those under the 2V electrodes and, provided it is not “overfilled” with charge, itis only in this well that charge will be stored. Suppose that the bias on the third electrode isnow increased to 10 V. If the second and third electrode are sufficiently close, both wells willmerge. The equations that govern the charge transfer process comprise the current densityand the continuity equation. For an n-type channel (electron charge) these equations are:

j(x, t) = qμnn(x, t)∂φ(x, t)∂x

+ qDn∂n(x, t)∂x

(4.77)

∂n(x, t)∂t

=1q

∂j(x, t)∂x

(4.78)

where charge propagation is in the x direction (no vector treatment), n(x, t) is the electrondensity, μn the electron mobility and Dn the electron diffusion coefficient. This diffusioncoefficient is related to the mobility of the charge carriers μn through

Dn

μn=kT

q(4.79)

84

Figure 4.15: Surface and channel potential for a SCCD ((a) and (b)) and a BCCD ((c) and(d)), with empty ((a) and (c)) and filled ((b) and (d)) wells. Figure taken from Theuwissen(1995).

(the so-called Einstein relation). The first right-hand term in equation 4.77 refers to drift ofthe charge packet under a gradient in electric potential, the second term refers to thermaldiffusion under the presence of an density gradient. The drift component arises from twocauses. The first cause derives from self-induced fields in the presence of a gradient in chargeconcentration, charges of the same type will mutually repel and reshuffle the concentrationof charge carriers in such a way that this gradient becomes zero. Secondly, due to so-calledfringing fields, in which charges are forced to move due to the existence of electric fieldsgenerated by the various voltage levels on the gates.

The equalizing of charge concentration under electrodes two and three described aboveis hence governed by the drift speed due to self-induced fields and by thermal diffusion. Inpractice these two processes are frequently described by introducing an effective diffusioncoefficient Dn,eff , since both arise from the existence of a gradient in carrier concentration.For large charge packets (large concentration difference), the process is mainly determined bythe self-induced drift, for small charge packets (e.g. < 1% of the well charge capacity) thermaldiffusion dominates and slows the transfer process down.

Returning now to figure 4.16, if the charge is evenly distributed under electrodes 2 and3 (situation c), reducing the potential on the second electrode to 2 V will introduce a fringefield and the remaining content of this well will drift into the third well owing to the fringefield. The final situation shown in (e) is a deep potential well and its associated charge packet

85

Figure 4.16: (a)-(e) Transport of a charge packet in a CCD. (f) clocking waveforms for athree phase CCD. Figure taken from Beynon & Lamb (1980).

moved one position to the right. By applying a succession of varying voltages to the gateelectrodes, the charge packet can be transported through the CCD structure in a controlledmanner. The gate electrodes can be grouped together with each group or phase connected toa different voltage clock. Various CCD structures and electrode systems are employed, theone described above requires a three-phase clocking system and is therefore referred to as athree phase CCD. Each image pixel requires in this case three gate electrodes to store andtransport the charge packet.

It is obvious that the transfer mechanisms described are not perfect, i.e. not all of thecharge is transferred and some signal charge is left behind. The charge transfer efficiency(CTE) is the ratio of charge transferred to the initial charge present. A complementary termoften used in the literature on these devices is the transfer inefficiency (CTI), i.e. the chargelost to the initial charge packet: CTI = 1 - CTE. Typical values of CTE are of the order0.99999 for a good device or CTI = 10−5. As already mentioned before the CTE of a SCCDis, without special measures not discussed here, considerably worse compared to the BCCDdue to the interface trapping states. Moreover, due to the relaxation of charges from thetraps, the so-called trapping noise in a SCCD is at least an order of magnitude larger than inBCCDs, which explains the preference for BCCDs in certain applications.

Till now, only charge transport along the gate electrodes in one direction had been dis-cussed. It is of course necessary to limit the extent of the potential well in the orthogonaldirection. This lateral confinement of the charge packet is achieved by a so-called channel-stop diffusion, which comprises a heavily doped region of the semiconductor relative to itsneighboring regions. This region exhibits a large conductivity σ relative to the surroundingmaterial and quenches the surfaces potential φS so that no depletion region can be formed.In this way one dimensional columns (or rows) are implemented in the CCD structure alongwhich the charge transfer occurs, isolated from its neighboring columns (rows). At the po-

86

Figure 4.17: Three dimensional display of a BCCD structure.

sitions of the channel stop diffusion (p+ or n+ material), the field oxide separating it fromthe gate structure is much thicker as to prevent break-down of the p+ (n+) contact to theconducting gate electrodes. A three dimensional display of a BCCD structure is shown infigure 4.17.

4.6.4 Charge capacity and transfer speed in CCD structures

The amount of charge that can be stored in a CCD pixel depends mainly on two variables:the clock voltages and the electrode area in combination with a SCCD or a BCCD structure.

Regarding a SCCD, the full-well storage capacity is in good approximation given by theoxide capacity under the gate electrode (AelecCox) and the gate voltage VG:

AelecCox =QInvVG

=⇒ QInv = AelecCoxVG (4.80)

(Remember: Cox is the oxide capacitance per unit area, Aelec is the electrode geometric area).As an example, take Aelec = 10 × 20 μm2, tox = 0.1 μm (Cox = εox/tox) and VG = 10 V: inthat case QInv = 0.6 pC ≈ 3.6·106 electrons.

In case of a BCCD, the expression is somewhat more complicated due to the finite depthof the charge storage and transport in the Si. However the ratio in charge handling capacitybetween SCCD and BCCD structures can be shown to approximate:

QSCCDQBCCD

= 1 +εoxxn3εSitox

(4.81)

in which xn represents the width of the neutral layer in the BCCD when the full storagecapacity is reached. For typical values xn = 2 μm and tox = 0.1 μm, this ratio amounts to

87

Figure 4.18: Fringing fields at the Si-SiO2 interface at several depths. Figure taken fromBeynon & Lamb (1980).

about 3. Consequently, creation of an embedded channel lowers the charge handling capacityby a substantial factor.

The intrinsic speed of charge transport in a CCD structure is governed by transportequation 4.38, depending on the time constants for self-induced drift, thermal diffusion andfringe field drift.

In a SCCD the time constant for self-induced drift is a function of charge density, Coxand the interelectrode spacing. For Cox = 1 pF, QInv = 1012 cm2 and spacing 25 μm, thetime constant τSi = 0.14 μs. The time constant for thermal diffusion of an electron packet(Dn ≈ 10 cm2·s−1) amounts to 0.25 μs. On this basis the high frequency limit would appearto be a few MHz, however the fringing field of the neighboring gate electrodes aid the transferconsiderably, especially when thermal diffusion is dominant and clocking frequencies up to 15MHz can be used for SCCDs

In the case of the BCCD the speed is potentially dominated by the fringing field of theneighboring gate due to the depth of the charge channel. The potential levels do not exhibitthe flat structures encountered at the Si-SiO2 interface as displayed in figure 4.16 but possessa continuous gradient along which charge can drift swiftly to a potential minimum. Thesituation is schematically shown in figure 4.18 for a few depths of the charge channel. BCCDs,with usually smaller charge packets, can therefor be read out at much higher frequencies, up to300 MHz, while still retaining acceptable values of the CTE for several imaging applications.

4.6.5 Focal plane architectures

The two-dimensional CCD imaging arrays can be subdivided into frame-transfer (FT) imagers,interline-transfer (IL) imagers and frame-interline-transfer (FIT) devices. The FT and ILarchitectures are briefly described below.

Figure 4.19 shows the device architecture of a frame-transfer image sensor, the operation is

88

Figure 4.19: Architecture of a frame-transfer image sensor. Figure taken from Theuwissen(1995).

shown in figure 4.20. The CCD comprises a photosensitive array and a memory array coupledto a linear output register. The light sensitive top part of the device, or image section,integrates the charges produced by the impinging photons over a predefined integration time.At the end of this period the charge packets are transferred along the columns into the memoryarray (figure 4.20(b)). This transfer needs to be done quickly to prevent disturbance by lightfalling on the image section during read-out. Subsequently each row in the memory sectionis clocked into the linear output register from which it is shifted to the output stage amplifier(video signal). During transport of all video information from the storage area to the outputstage, all CCD cells in the image array are again biased in the integration mode, allowing thebuild-up of the next image.

A simpler version of the frame-transfer imager is the full-frame device, which lacks the stor-age section. During read-out the incoming light should be interrupted to avoid disturbances.This can be done by using a shutter which cuts out the light path to the imager.

The typical architecture of the interline-transfer imager is displayed in figure 4.21. In thiscase the photo-sensitive pixels are located near the shielded CCD transport register. Theintegration of charge takes place in the photo-sensitive pixels during exposure to the radiationsource, at the end of the integration time the charge packets from the photo-sensitive columns

89

Figure 4.20: Working principle of a frame-transfer image sensor. Figure taken from Theuwis-sen (1995).

Figure 4.21: Architecture of an interline-transfer image sensor. Figure taken from Theuwissen(1995).

are transferred in parallel to the vertical registers alongside them, see figure 4.22(b). Thevertical shift registers are emptied into the horizontal-output register row by row which areread-out serially at the output. The advantage of this architecture is the very short transfertime to the parallel storage column as compared to the serial shift to the storage section in thecase of the frame-transfer CCD. On the other hand, the presence of periodic storage columnsin the image area may introduce severe aliasing effects close to the spatial Nyquist frequency

90

Figure 4.22: Working principle of an interline-transfer image sensor. Figure taken fromTheuwissen (1995).

of the CCD-array.If a pixel has a characteristic size Δx, the sampling of an image can be described with

an array of normalized window functions: 1ΔxΠ

(x

Δx

). The spatial frequencies (s) associated

with this window function is obtained from its Fourier transform: sincsΔx.If the pitch of the pixels in the image plane is given by x0, the Nyquist frequency follows

from sN = 12x0

. Normalizing the spatial frequency s on sN yields the geometrical MTF for alinear array of pixels Δx with pitch x0 (s0 = s/sN ):

MTFgeo = sincs0Δx2x0

(4.82)

For contiguous pixels, like in a frame-transfer CCD array, Δx ≈ x0 yielding:

MTFFT = sincs02

(4.83)

with MTFFT = 0 for s0 = 2, i.e. s = 2sN .For non-contiguous pixels, like an interline transfer CCD array,Δx ≈ x0

2 yielding:

MTFIL = sincs04

(4.84)

with MTFIL = 0 for s0 = 4, i.e. s = 4sN .This analysis indeed shows that the MTF of the interline-transfer CCD extends twice

as high above the Nyquist frequency as compared to the frame-transfer CCD, giving rise topotential Moire fringes due to aliasing. This effect can only be avoided by limiting the spatialfrequencies of the input-image signal to sN .

4.6.6 Wavelength response of CCDs

For visual light applications, two possibilities can be considered to irradiate the CCD, i.e.irradiation through the front surface and irradiation through the back-surface, so-called back-illumination. Illumination through the front surface, i.e. the side which contains the gate

91

Figure 4.23: CCD spectral response. Figure taken from Beynon & Lamb (1980).

structure, requires the usage of poly-silicon gate electrodes which transmit light. A problemencountered in this case entails the strongly wavelength dependent absorption and interferenceeffects occurring in the thin poly-silicon gate layer (≈ 0.5 μm) and the thin oxide layer (≈ 0.1– 0.2 μm). The current responsivity and the interference effects are shown in figure 4.23. Also,the blue-responsivity is strongly suppressed by absorption in the poly-silicon gate material.

Back-illumination requires thinning of the Si-substrate to ensure that the photon-generatedcharge reaches the potential wells. The charge transport can be aided by building an electricfield gradient into the semi-conductor, this electric field gradient can be provided by increasingthe substrate doping concentration in the regions close to the silicon surface. This increaseddoping has the effect of bending the energy bands at the back of the surface so as to acceleratethe photon-generated carriers towards the front surface and the potential wells. This tech-nique is particularly useful for increasing the blue-responsivity where the charge carriers aregenerated close to the rear silicon surface. This back-face response can be further improved byminimizing the reflection of light from the back surface employing a λ/4 thick layer of siliconmonoxide at the wavelength of interest. Figure 4.23 shows a quantum efficiency of about 50% with back-illumination, which can be raised to a peak efficiency of about 90% by using theproper antireflection coating. Figure 4.24 shows a complete family of CCDs for optical lightwith different architectures and array sizes.

Non-visible imaging with CCDs is normally divided into three wavelength ranges:Wavelengths longer than 1 μm, for which the photo-electric absorption coefficient in sil-

icon is too low (hνIR < bandgap) and all photons pass through the structure without beingabsorbed. For this reason infrared photons need to be converted first into electrons, e.g. bymeans of so-called Schottky-barrier structures in which pixels are used made out of platinum-silicide (PtSi). An array of these detectors is then coupled to a CCD read-out system.

The responsivity in the thermal IR can theoretically be extended in this way to approxi-mately 5.6 μm.

For wavelengths shorter that 0.4 μm and longer than 10 nm the opposite is the case ascompared to the IR range: the absorption is very high since not only silicon, but also the

92

Figure 4.24: Photo of several CCDs. Top left: back-side illuminated FT imager with 1260× 1152 pixels. Top right: a full frame CCD with 3k × 2k pixels, each of 9 μm × 9 μm.Middle: a low-cost FT sensor with 270,000 pixels, total diameter of the image section is 3.2mm. Bottom left: HDTV IT image sensor with 2M pixels, 11 mm diagonal. Bottom right:FT CCD for broadcast applications with 500,000 pixels.

silicon oxide layer has a very high absorption coefficient in this wavelength regime. In figure4.25 the penetration depth in silicon is displayed over a wide wavelength range, it is clearthat the UV photon will be stopped in the top layers above the CCD. To avoid this problemseveral solutions are possible:

Deposition of an UV-sensitive phosphor on top of the active area of the imager whichdown-converts the energy of the UV-photons to longer wavelengths. An example is Coronene,which fluoresces in the green portion of the visible spectrum under UV exposure.

Also, as in the visible regime, back-illumination is an option. However, due to the highabsorption, the substrate of the CCD has to be thinned to typically 10 μm, which is a very ex-pensive process and requires subsequently very delicate handling of the device. An alternativeis deep depletion of a lightly-doped, high-resistivity substrate, such that the depletion regionunder the CCD gates extends to the back of the siliconwafer. The charge carriers generatedby the UV illumination are swept to the front side into the potential wells by the electricfield of the deep depletion layer. This approach does not require the extreme thinning men-tioned above for conventional high-doped material, a thickness of 50 μm still allows adequatecollection of charge owing to the deep depletion field.

Application of CCDs in X-ray imaging has made a substantial development over the recent

93

Figure 4.25: Penetration depth of silicon as a function of wavelength. Figure taken fromTheuwissen (1995).

years, in particular for X-ray astronomy where the X-ray photon flux is sufficiently low toregister the (small) charge packet associated with a single X-ray photon. In contrast tothe optical application, exposures yielding no more than one X-ray photon per 100 pixelsis pursued before read-out, since in this case both spectral an spatial information can beobtained simultaneously, i.e. the magnitude of the charge packet represents in this case theenergy of the impinging X-ray photon. Moreover, apart from the simultaneous spatial andspectral resolution, deep depletion CCDs (30 – 50 μm) provide a high quantum efficiency (>90%) over a wide X-ray range (0.2 – 10 nm) and are therefore ideally suited as an imagingspectrometer behind a grazing incidence X-ray telescope. Back-illumination, which againavoids the problem of penetrating the gate structure and the oxide layer, gives quite superiorresponse to low-energy X-rays (2 – 10 nm). Moreover, as importantly, a deep depletion layerminimizes the effect of charge diffusion of the X-ray-generated charge cloud, since the electricfield causes this cloud to quickly drift into the potential well. In this way degradation ofspatial and spectral resolution is minimized. A second cause of degradation of spatial andspectral resolution is charge loss when X-ray photons are absorbed at the boundary of two ormore pixels, since charge splitting between these pixels will occur. This problem can be dealtwith later in the image processing, provided no charge loss occurs due to a recombinationprocess (e.g. near a highly-doped stop-diffusion).

X-ray CCDs require a very high CTE value (CTI = 10−5 – 10−6) to retain the intrinsic,quantum noise dominated, spectral resolution. Moreover, the CCD needs to be cooled to anoperating temperature of about 200 K to reduce the influence of several noise sources (thermalnoise, read-out noise).

94

Chapter 5

Fitting observed data

In this chapter we discuss how one can describe observations, and how one can comparemeasurements with model predictions.We first introduce discrimination between systematic and random errors, and show how themost common distribution functions for random errors are most effectively computed, followedby a discussion of error propagation. Next, we discuss the criteria that should be used to decidewhich fit of a data set is considered to be best, for the case where the errors are distributedaccording to a Gaussian distribution, and for the case where they are Poissonian. We describefitting procedures specific for the least squares criterion, both for the linear and the non-linearcase. Finally, we discuss fitting procedures which are more generic, that may be applied bothto least squares and to the case where the errors follow a Poissonian distribution.

5.1 Errors

5.1.1 Accuracy versus precision

We may discriminate between the accuracy and the precision of an experiment. The accuracyof an experiment is a measure of how close the result of an experiment comes to the true value.Therefore, it is a measure of the correctness of the result. The precision of an experimentis a measure of how exactly the result is determined, without reference to what that resultmeans. It is also a measure of how reproducible the result is. The absolute precision indicatesthe magnitude of the uncertainty in the result in the same units as the result. The relativeprecision indicates the uncertainty in terms of a fraction of the value of the result.It is obvious that accuracy and precision need to be considered simultaneously for any exper-iment. It would be a waste of time to determine a result with high precision if we knew theresult would be highly inaccurate. Conversely, a result cannot be considered to be accurateif the precision is low. The accuracy of an experiment as defined here generally depends onhow well one can control or compensate for systematic errors. These are the errors which willmake a result differ from the true value with reproducible discrepancy. The precision of anexperiment is dependent on how well one can overcome or analyse random errors. These arethe fluctuations in observations which yield results that differ from experiment to experimentand that require repeated trials to yield precise results. A given accuracy implies a precisionat least as good and, consequently, depends to some extent on the random errors also. So wediscriminate between:

• A systematic error : reproducible inaccuracy introduced by faulty equipment, calibra-tion, or technique.

95

• A random error : Indefiniteness of result due to the a priori finite precision of an exper-iment. Also: measure of the fluctuation in a result after repeated experimentation.

In Chapter 3, section 2, of the OAF1 lecture series we have introduced the subject of randomphenomena and variables that constitutes the mathematical basis for handling random errors.If an infinite number of measurements could be taken, the result would give the way in whichthe observed data points are distributed, i.e. the true probability distribution. This distri-bution is referred to as the parent distribution, any practical measurement can be perceivedas a sample of an infinite number of possible measurements. Consequently the experimentaldata only allow an estimate of the parameters that describe the parent distribution. For adescription of the most common probability distributions we refer here to OAF1-section 3.2.For convenience we only repeat here the definition of the sample parameters which can be de-rived from a limited number of available measurements to approximate the parameter valuesdescribing the underlying parent distribution.

For the average we have:

μ x ≡ 1N

N∑i=1

xi (5.1)

and for the variance:

σ2 s2 ≡ 1N − 1

N∑i=1

(xi − x)2 =1

N − 1

(N∑i=1

xi2 −Nx2

)(5.2)

In estimating the average with Eq. 5.1 we assume that all measurements xi are indepen-dent. In estimating the variance with Eq. 5.2 we note that we already have used the xi valuesto estimate the average, which implies that we have N − 1 independent measurements left.(If we know N − 1 values of xi and x, we can determine the one remaining xi.) This is thereason that the sum of (xi − x)2 is divided by N − 1 (instead of by N). Indeed, if we onlyhave one measurement, i.e. N = 1, then the best value for the average is given by this onemeasurement, but we have no measure for the variance.

Take x as a variable in Eq. 5.2; to find the value of x for which s2 is minimal, we set thederivative of s2 with respect to x to zero:

∂s2

∂x=

−2N − 1

N∑i=1

(xi − x) =−2

N − 1

(N∑i=1

xi −Nx

)= 0 (5.3)

From the last two terms we obtain once more Eq. 5.1. This shows that the definition Eq. 5.1for the sample average minimizes the sample variance given by Eq. 5.2.

When the measurements are binned, as is always the case if we have digital measurementsbut also often in other cases, we find the average and its variance from the binned data. Wewrite the number of bins as B, and denote the value of x in bin b as xb and the number of mea-surements in that bin as Nb, where b = 1, . . . , B. We normalize the number of measurementsin each bin on the total number of measurements, to find the probability that a measurementfrom the sample falls in bin b as Pb ≡ Nb/

∑Bb=1Nb, and rewrite Eqs. 5.1,5.2 as

x =B∑b=1

Pbxb (5.4)

96

and

s2 =N

N − 1

(B∑b=1

Pbxb2 − x2

)(5.5)

Higher order moments of distributions with points far from the average (’outliers’) arevery sensitive to outliers: a large xi−x value dominates much more in the distribution of the(xi − x)2’s than in the distribution of |xi − x|’s.

For an analogous reason, it is strongly recommended never to use even higher moments,such as the third order moment or skewness and the fourth order moment or kurtosis. Forcompleteness, here are the definitions of these moments:

Skewness ≡ 1Nσ3

∑(xi − x)3 (5.6)

andKurtosis ≡ 1

Nσ4

∑(xi − x)4 − 3 (5.7)

where σ is the standard deviation. The subtraction of 3 in the kurtosis is made so that thekurtosis of a Gauss distribution is zero.In this course we shall stick to the common practice of mainly using the variance.First of all, we shall show now how the various probability distributions can be computedmost effectively.

5.1.2 Computing the various distributions

In the bad old days, one was forced to use tables in thick, heavy books whenever one used thedistributions described above. Nowadays, fast computers allow us to compute the quantitieswe need, with relative ease. Chapter 6 of Numerical recipes (2nd edition) contains everythingwe need, and more. . .

We start with the factorials, necessary in calculations with binomial and Poisson function.We first define the gamma function

Γ(z) =∫ ∞

0tz−1e−tdt (5.8)

and then note that for integer z, the gamma function equals the factorial, with an offset ofone in the argument:

n! = Γ(n+ 1) (5.9)

For small values of n, the factorial can be computed directly as n× (n−1)× (n−2) . . .; forlarge values we can use the gamma-function. For large n there is the danger that the factorialis larger than the largest number allowed by the computer. In that case it is better to calculatethe logarithm of the factorial (or equivalently, the logarithm of the gamma function).

These considerations are taken into account in the following useful function routines fromNumerical Recipes:

• function gammln(x) returns ln Γ(x) for input x

• function factrl(n) returns real n! for input integer n

• function factln(n) returns real lnn! for input integer n

• function bico(n,k) returns real

(nk

)for integer inputs n, k

97

These routines enable us to compute binomial and Poisson probabilities.A useful quantity is the cumulative Poisson probability, the probability that a Poisson

process will lead to a result between 0 and k − 1 inclusive:

Px(< k) ≡k−1∑n=0

PP (k, x)

For its computation we use the incomplete gamma function:

P (a, x) ≡ 1Γ(a)

∫ x

0ta−1e−tdt (5.10)

Its complement is also called the incomplete gamma function (which is somewhat confus-ing. . . ):

Q(a, x) ≡ 1 − P (a, x) ≡ 1Γ(a)

∫ ∞

xta−1e−tdt (5.11)

It is easily shown that the cumulative Poisson probability is given by (the second type of) theincomplete gamma function:

Px(< k) = Q(k, x)

The corresponding routines in Numerical Recipes are:

• function gammp(a,x) returns P (a, x) for input a, x

• function gammq(a,x) returns Q(a, x) for input a, x

Finally, we look at the integral probability of the Gauss function (see Eq.??), which canbe computed from the error function:

erf(x) =2√π

∫ x

0e−t

2dt (5.12)

Equally useful is the complementary error function

erfc(x) =2√π

∫ ∞

xe−t

2dt (5.13)

The wonderfull fact is that we don’t have to write a new routine to compute the error functions,because they are given by the incomplete gamma functions:

erf(x) = P (1/2, x2) (x ≥ 0)

erfc(x) = Q(1/2, x2) (x ≥ 0)

The corresponding routines in Numerical Recipes are:

• function erf(x) returns erf(x) for input x, using gammp

• function erfc(x) returns erfc(x) for input x, using gammq

• function erfcc(x) which returns erfc(x) based on a direct series development ofEq. 5.13.

98

5.2 Error propagation

Suppose we have a function f of a number of variables u, v, . . .:

f ≡ f(u, v, . . .) (5.14)

We want to know how we can estimate the variance of f ,

σf2 ≡ lim

N→∞1N

N∑i=1

(fi − f)2 (5.15)

from the variances σu, σv, . . . of the variables u, v, . . .. To derive this we will make the assump-tion, which is usually only approximately correct, that the average of f is well approximatedby the value of f for the averages of the variables:

f = f(u, v, . . .) (5.16)

We make a Rayleigh-Taylor expansion of f around the average:

fi − f (ui − u)∂f

∂u+ (vi − v)

∂f

∂v+ . . . (5.17)

to write the variance in f as

σf2 lim

N→∞1N

N∑i=1

[(ui − u)

∂f

∂u+ (vi − v)

∂f

∂v+ . . .

]2

= limN→∞

1N

N∑i=1

[(ui − u)2

(∂f

∂u

)2

+ (vi − v)2(∂f

∂v

)2

+ 2(ui − u)(vi − v)∂f

∂u

∂f

∂v+ . . .

]

(5.18)We rewrite this by using the definitions for the variances of u and v,

σu2 ≡ lim

N→∞1N

N∑i=1

(ui − u)2; σv2 ≡ lim

N→∞1N

N∑i=1

(vi − v)2 (5.19)

and defining the covariance of u and v as

σuv2 ≡ lim

N→∞1N

N∑i=1

(ui − u)(vi − v) (5.20)

to obtain:

σf2 = σu

2(∂f

∂u

)2

+ σv2(∂f

∂v

)2

+ 2σuv2 ∂f

∂u

∂f

∂v+ . . . (5.21)

If the differences ui − u and vi − v are not correlated, the sign of their product is as oftenpositive as negative, and the sum in eq. 5.20 will be small as subsequent terms (almost) cancel.On the other hand, if the differences are correlated, most products (ui − u)(vi − v) will bepositive, and the cross-correlation term in Eq. 5.21 can be large as subsequent terms add up.

99

5.2.1 Examples of error propagation

weighted sum: f = au+ bv

For the partial derivatives we have

∂f

∂u= a,

∂f

∂v= b (5.22)

and substituting these in eq. 5.21 we obtain

σf2 = a2σu

2 + b2σv2 + 2abσuv2 (5.23)

Note that a and b can each have positive or negative sign. Their signs only have an effecton the cross-correlation term, which as a consequence can be negative, and make the variancesmaller! For example, if each ui is accompanied by a vi such that vi − v = −(b/a)(uu − u),then f = au+ bv for all ui, vi pairs, and σf 2 = 0.

product: f = auv

With∂f

∂u= av,

∂f

∂v= au (5.24)

we obtainσf

2

f2=σu

2

u2+σv

2

v2+

2σuv2

uv(5.25)

division: f = au/v

∂f

∂u=a

v,

∂f

∂v= −au

v2(5.26)

henceσf

2

f2=σu

2

u2+σv

2

v2− 2σuv2

uv(5.27)

exponent: f = aebu

∂f

∂u= bf (5.28)

henceσff

= bσu (5.29)

power: f = aub

∂f

∂u=bf

u(5.30)

henceσff

= bσuu

(5.31)

100

5.3 Errors distributed as a Gaussian and the least squares

method

Suppose we have a series of measurements yi with associated errors distributed according toa Gaussian with width σi, i.e. each measurement is drawn from a Gaussian with width σiaround a model value ym. For one measurement, the probability P (yi) ≡ Pi of obtaining ameasurement yi, in an interval Δy, is then given by

PiΔy =1√2πσi

e−(yi−ym)2

2σi2 Δy (5.32)

where we allow that different measurements have different associated errors σi. If we have aseries of N measurements the overall probability of obtaining such a series is:

P (Δy)N ≡N∏i=1

(PiΔy) =1

(2π)N/2∏i σi

exp

[−1

2

N∑i=1

(yi − ym)2

σi2

]ΔyN (5.33)

The highest probability P is that for which

χ2 ≡N∑i=1

χi2 ≡

N∑i=1

(yi − ym)2

σi2(5.34)

is smallest. If we wish to determine the most probable model value for ym for a series ofmeasurements yi, we thus must find the value(s) for ym for which the sum of the squares(yi − ym)2/σi2 is minimal. This is called the method of least squares and was described firstin 1804 by Gauss, following earlier work by Lagrange and Laplace.

According to the assumption that the errors are gaussian, as stipulated above, each χi isa random draw from a normal distribution; the sum of the χi2 is called chi-square. If we haveN measurements, and we fit M parameters, we have N −M independent measurements left.The probability distribution for χ2 is called the chi-square distribution for N −M degreesof freedom. This distribution is what we get if would draw N −M random samples from anormal distribution, square them, and add the squares, and repeat this many times, each timeproducing one χ2. The probability of a given χ2 can be computed with the gammq-function,as follows: the probability that an observed χ2

obs or greater is reached for N −M degrees offreedom is P (χ2

obs) = gammq(0.5(N −M), 0.5χ2obs). If the probability that we get is very small,

then something is wrong. The most obvious possibility is that the model is wrong. Anotherpossibility is that the errors have been under-estimated, so that we demand a better agreementwith the model than the measurement accuracy warrants. A third possibility is that the errorsare not distributed as Gaussians. This latter possibility tends to increase the observed χ2; andoften causes us to accept probabilities somewhat lower than one would superficially expect.Thus in many cases, a probability of 0.05 is considered quite acceptable. The reason onecan do this, is that really wrong models easily produce much smaller probabilities, less than0.000001, say. Also, even in a perfect statistical process, a probability of 5% arises on averageonce every 20 trials, so that finding such a probability due to chance is quite common. If onefinds in a certain experiment that one always produces low probabilities, one must look intoit and try to determine the cause.

It is worth noting that apparently significant results can arise from ignoring negativeresults. The most obvious case is the winning of a lottery. Suppose the winner is the personwho has guessed the correct six-digit number. The probability for one person to have the

101

winning number is one in a million, but if several million people participate we expect severalto have guessed the correct number. A less obvious case is an experiment which is repeatedmany times. A well-known case here is the stock-broker. Suppose that one starts with tenmillion people, who predict how stocks change in value. After a year, select the people whoare among the ten best predictors; repeat this five times (so that six rounds have been made),then there will be (on average) ten stock brokers who have predicted among the best 10%for six years in a row, if the prediction process is purely random! It is thus by no meansobvious that a stock-broker who has predicted correctly for many years in a row is betterthan random – one can only find out if one knows how many stock-brokers there are. . . .(Indeed, in experiments in which a gorilla selects the stocks, the gorilla often doesn’t do badat all.)

When we make a fit of a model to a dataset, the result should always consist of threeparts:

1. the best values of the parameters a1, a2, . . .

2. the errors on these parameter values

3. the probability that the measured χ2 is obtained by chance; i.e. the probability that themodel adequately describes the measurements

It is very important always to provide all these three desiderata.We will discuss three cases for ym, viz. i) a constant, in which ym is the same for all i; ii)

a straight line ym = a+ bx where ym depends on the variable x and the model parameters aand b; and iii) the general case in which ym depends on a variable x and a set of parametersa1, a2, . . ., which we write as ym = ym(x;�a).

5.3.1 Weighted averages

If the model for ym = a is the same for all measurements, then the constant a is the (bestestimate for the) average y of the yi’s. The least squares method tells us that we find themost probable value for a by minimizing eq. 5.34 with respect to a, i.e.

∂

∂a

[N∑i=1

(yi − a)2

σi2

]= 0 ⇒

N∑i=1

yi − a

σi2= 0 (5.35)

Thus the least squares are found for

y ≡ amin =

∑Ni=1

yi

σi2∑N

i=11σi

2

(5.36)

To obtain an estimate of the error in y, we note that y is a function of the variablesy1, y2, . . .. If the measurements yi are not correlated, we find the variance for y from (seeeq. 5.21):

σy2 =

N∑i=1

[σi

2(∂y

∂yi

)2]

=N∑i=1

⎡⎣σi2

(1/σi2∑N

k=1(1/σk2)

)2⎤⎦ (5.37)

which gives

σy2 =

1∑Ni=1(1/σi2)

(5.38)

102

In the case where all measurement errors are identical, σi ≡ σ, eqs. 5.36,5.38 can be furthersimplified to

y =(1/σ2)

∑Ni=1 yi

(1/σ2)∑Ni=1(1)

=1N

N∑i=1

yi (5.39)

and

σy2 =

σ2

N(5.40)

5.3.2 Fitting a straight line

We write the model of a straight line as

ym(x, a, b) = a+ bx⇒ ym(xi, a, b) = a+ bxi (5.41)

where the parameters to be fitted are a and b. We now wish to minimize the χ2, according toEq. 5.34, with respect to a and b, and do this by looking for the values for which the derivativeof χ2 is zero:

∂∑Ni=1[(yi − a− bxi)/σi]2

∂a= 0 ⇒

N∑i=1

(yi − a− bxi

σi2

)= 0 ⇒

N∑i=1

yiσi2

−aN∑i=1

1σi2

−bN∑i=1

xiσi2

= 0

(5.42)∂∑Ni=1[(yi − a− bxi)/σi]2

∂b= 0 ⇒

N∑i=1

xi(yi − a− bxi)σi2

= 0 ⇒N∑i=1

xiyiσi2

−aN∑i=1

xiσi2

−bN∑i=1

xi2

σi2= 0

(5.43)The important thing to note now is that all the sums can be evaluated without knowledge ofa or b. To make the equations easier to interpret we define the sums as

N∑i=1

1σi2

≡ S;N∑i=1

xiσi2

≡ Sx;N∑i=1

xi2

σi2≡ Sxx

N∑i=1

yiσi2

≡ Sy;N∑i=1

xiyiσi2

≡ Sxy; Δ ≡ SSxx − (Sx)2

and rewrite the above two equations as two equations for two unknowns a and b:

aS + bSx − Sy = 0

aSx + bSxx − Sxy = 0

with the solutionsa =

SxxSy − SxSxyΔ

(5.44)

b =SSxy − SxSy

Δ(5.45)

To find the errors in a and b, we note that a and b depend on the independent parameters yi,such that

∂a

∂yi=Sxx − xiSxσi2Δ

;∂b

∂yi=xiS − S

σi2Δ(5.46)

and use Eq. 5.21 to obtain (after some rewriting)

σa2 =

SxxΔ

; σb2 =

S

Δ(5.47)

103

Figure 5.1: Total number of 1st-year students in the Netherlands in physics and astronomy(above) and in astronomy only (below) between 1990 and 2000. Assumed poissonian errorsare indicated. Data taken from the Nederlands Tijdschrift voor Natuuurkunde, which stoppedpublishing these numbers in 2000. . .

As the third ingredient of our fit we determine the probability that a good fit wouldproduce the observed χ2

obs or bigger as

Q = gammq

(N − 2

2,χ2

obs

2

)(5.48)

As an illustration of some more subtleties we have a look at Figure 5.1, where the numberof new physics and astronomy students in the Netherlands are plotted for the last decadeof the 20st century. The first point concerns the errors in measured integer numbers. It isassumed in the figure that the error on the number of students is given by the square root ofthe number, and these errors are shown. The reader may be surprised now, and say: ’Whyis there an error? After all, we know exactly how many students there are in a given year!’.The answer is that for the purpose of fitting a model the numbers of students can vary dueto chance, and that the actual number in a year is drawn from a distribution (poissonian inthis case) around the expected value. Similarly, if an X-ray experiment detects N photons ina given time interval, in a given energy range, then the error that we assign to this numberwhen we fit it in a model is not zero – even though we know exactly how many photons weremeasured!

The second point concerns a good choice of a and b. If we fit the number of studentsas N(t) = a + bt where t is the year, then a gives the number of students for the year 0.

104

Apart from the fact that this number is meaningless in the current context, there are twoother problems associated with this choice. The first one is that the sums involving xi-valuesin Eqs. 5.44–5.47 are very large numbers, so that subtracting them from one another (as incomputing Δ) easily leads to roundoff errors, and thus to wrong results. The second oneis that in this choice of a and b there will be a very strong correlation between the errors:if we change b a little, a changes dramatically in one direction. Both these problems canbe circumvented by centering the time interval around the point of fitting, i.e. by effectivelyfitting N = a + b(t − 1994). Now, the sums involve smaller numbers, and the pivot pointaround which a and b vary is such that the correlation of their variations are minimized.

Also for non-linear fits, it is good practice in astronomy to define the time with respect tosome fiducial point somewhere near the middle of the measurements.

5.3.3 Non-linear models: Levenberg-Marquardt

When our model involves a parameter a which enters the ym, and through this the χi, non-linearly, we cannot execute a summation without having (an estimate for) the value of a. Asan example, consider the function y = sin(ax), which has (see Eq. 5.34)

∂χ2

∂a= −2

N∑i=1

[yi − sin(axi)]xi cos(axi)σi2

(5.49)

and we see that the summation cannot be executed without a value for a.In general, this means that a model y(x,�a) which is non-linear in �a can be fitted to a set

of data only iteratively, in the sense that one needs a first set of values for �a, and then findssuccessive improvements of these values.

To understand how this is done, we discuss first a one-dimensional case where there is oneparameter a that is fitted, so that χ2 = χ2(a). We are then looking for the value of a forwhich χ2(a) is minimized.

When one is far from the minimum, we can use the derivative ∂χ2/∂a to decide in whichdirection to look for the improved value:

an+1 = an −K∂χ2

∂a(5.50)

where K is a constant (the value of which we discuss below).Once we get close to the minimum of χ2, this way of stepping towards a better solution

becomes less efficient, since near the minimum the first derivative approaches zero. Close tothe minimum we approximate χ2 as a quadratic function of a: χ2(a) = p + q(a − amin)2,where p is the minimum value of χ2, reached at a = amin. We combine the derivatives∂χ2/∂a = 2q(a− amin) and ∂2χ2∂a2 = 2q, to write:

a− amin =∂χ2/∂a

∂2χ2/∂a2⇒ an+1 = an − ∂χ2/∂a

∂2χ2/∂a2(5.51)

Now make the step to a more-dimensional case, i.e. with more than one parameter, andwrite

χ2(�a) p− �q · �a+12�a · ��D · �a (5.52)

where for model ym = ym(x,�a) we have:

qk ≡ ∂χ2

∂ak= −2

N∑i=1

[yi − ym]σi2

∂ym∂ak

≡ −2βk (5.53)

105

and

Dkl ≡ ∂χ2

∂ak∂al= 2

N∑i=1

1σi2

[∂ym∂ak

∂ym∂al

− [yi − ym]∂2ym∂ak∂al

]≡ 2αkl (5.54)

The second term on the right hand side is expected to be small with respect to the firstone, because yi − ym will almost equally often be positive as negative, and subsequent termsin the summation will almost cancel. To save computing time, one can drop the second term.It often turns out that the iteration to the best solution becomes more stable by doing this.

For the more dimensional case we can write for Eqs. 5.50,5.51:

βk = λαkkδak (5.55)

βk =M∑l=1

αklδal (5.56)

Here we have chosen to scale the proportionality constant K in Eq. 5.50 with the secondderivative, where λ is a constant.

The Levenberg-Marquardt method now continues by adding the two last equations:

βk =M∑l=1

α′klδal where α′

kl = αkl (if k �= l); α′kl = αkl(1 + λ) (if k = l) (5.57)

For large λ Eq. 5.57 approaches Eq. 5.55, and for small λ it approaches Eq. 5.56.The solution of the equation then proceeds as follows. One picks an initial solution, and

a small value for λ (i.e. one hopes that the solution is already close enough for the quadraticapproximation). Compute the χ2 for the initial solution, and compute a new value for �a withEq. 5.57. If the χ2 for the new solution is smaller (larger) than for the old one, then thequadratic approach does (doesn’t) work, and one should decrease (increase) λ to get closerto the purely quadratic (linear) method of Eq. 5.56 (5.55). This process is iterated until theminimum χ2 is found.

We also need the errors on the best parameters �a. For this we use the general relation(near minimum)

δχ2 = δ�a · ��α · δ�a (5.58)

If the best values of the parameters are not correlated, then the matrix α is close to diagonal,i.e. the off-diagonal elements αkl are much smaller than the elements αkk on the diagonal. Inthat case the inverse matrix C is almost given by Ckk = 1/αkk, and the distance δak to thebest value ak has

δak2 =

Δχ2

αkk(5.59)

Thus to estimate a 1-sigma error in ak we enter Δχ2 = 1 in Eq. 5.59.When the errors are correlated, the situation is more complex, and the matrix C gives the

correlation between the deviations of the parameters from their best values. For details werefer to Numerical Recipes.

5.3.4 Optimal extraction of a spectrum

For this topic we refer to the article by K.D. Horne, 1985, An optimal extraction algorithmfor CCD spectroscopy, PASP 98, 609-617.

106

5.4 Errors distributed according to a Poisson distribution and

Maximum likelihood

In measurements in which a small number of events is counted, we are often dealing witha distribution of the measurements which is given by the Poisson distribution. In this case,the least-squares methods do not apply. We will discuss the maximum-likelihood method forerrors with a Poissonian distribution. In the literature, this method is often referred to asthe maximum-likelihood method, where the Poissonian error distribution is implied. We willfollow this practice, but only after noting that this abbreviated name is misleading, in thatthe least-squares method is also a maximum-likelihood method, viz. the maximum-likelihoodmethod for errors distributed as Gaussians!

To illustrate the maximum-likelihood method we consider the number of counts on aphoton-counting detector. We write the number of counts detected in location i as ni, andthe number of counts predicted for that location by the model as mi. The probability at onelocation to obtain ni photons when mi are predicted is

Pi =mi

nie−mi

ni!(5.60)

If the values of mi (and ni) are large, this probability may be approximated with a Gaus-sian, and one may use a least-squares method. Typically, a value of 20 is used. This valueis based on the assumption that the difference between the Poisson and Gaussian distribu-tions for large μ is less important than the uncertainties due to systematic effects in themeasurements. In principle, this assumption should be verified in each case.

For small values of mi (and ni) we proceed to use the Possion distribution. To maximizethe overall probability we thus have to maximize

L′ ≡∏i

Pi (5.61)

and for computational ease we maximize the logarithm of this quantity

lnL′ ≡∑i

lnPi =∑i

ni lnmi −∑i

mi −∑i

lnni! (5.62)

The last term in this equation doesn’t depend on the assumed model, and thus – in terms ofselecting the best model – may be considered as a constant. Thus maximizing L′ is equivalentto minimizing L, where

lnL ≡ −2

(∑i

ni lnmi −∑i

mi

)(5.63)

If one compares two models A and B, with number of fitted parameters nA and nB andwith likelihoods of lnLA and lnLB , respectively, the difference ΔL ≡ lnLA − lnLB is a χ2

distribution with nA − nB degrees of freedom, for a sufficient number of photons (Cash 1979ApJ 228, 939; Mattox et al. 1996 ApJ 461, 396).

5.4.1 A constant background

As a first example we model a constant background: mi = A. With Z pixels we thus have atotal number of photons in the model Nm = ZA; write the total observed number of photonsas No Inserting this in eq. 5.63 we have

−0.5 lnL =∑

ni lnA−∑

A = No lnA− ZA = No ln(Nm/Z) −Nm (5.64)

107

and we have to minimize this with respect to Nm. Thus

∂ lnL∂Nm

= 0 ⇒ No

Nm− 1 = 0 ⇒ No = Nm (5.65)

and we see that the best solution, as expected, is the one in which the number of photons inthe model equals that in the observation.

5.4.2 Constant background plus one source

Let us now assume that we have a source with strength B, and that a fraction fi of this landson pixel i. Eq. 5.63 now becomes

−0.5 lnL =∑i

ni ln(A+Bfi) −∑i

(A+Bfi) (5.66)

We search the minimum of L for variations in A and B:

∂ lnL∂A

= 0 ⇒∑i

niA+Bfi

−∑i

(1) =∑i

niA+Bfi

− Z = 0 (5.67)

∂ lnL∂B

= 0 ⇒∑i

nifiA+Bfi

−∑i

fi =∑i

nifiA+Bfi

− 1 = 0 (5.68)

We thus have two equations for the two unknowns A and B. Multiply the first equation withA, the second with B, and add the two equations to find∑

i

ni = AZ +B (5.69)

i.e. the total number of counts in the best model is equal to the total number of observedcounts. This condition may be used so that one only has to fit one, rather than two parameters.

5.5 General Methods

To produce the best fit for the case of Poissonian errors, several methods are available, ofwhich we discuss a few. The methods are distinct from the methods discussed above for theleast-squares cases, in that they do not use the derivative of the function, but only the functionvalue itself, together with the criterion of best fit. These methods are equally suited for least-squares problems, with Eq. 5.34 as the criterion, and for maximum-likelihood problems withPoisson statistics, where Eq. 5.63 must be minimized. For simple problems, these methodstend to be slower in converging on a good solution than the Levenberg-Marquardt method.For problems with many variables, where the matrix α (Eq. 5.54) becomes very big, and forproblems in which the χ2 distribution has many local minima, the methods discussed beloware actually more efficient than the Levenberg-Marquardt method. An example is the fit tothe lightcurves (in U ,B,V ) and radial velocities of a binary: this problem has many variables,and can have many local minima.

5.5.1 Amoebe

See Numerical Recipes (2nd ed.) chapter 10.4

108

5.5.2 Genetic algorithms

For an explanation of the principle of genetic algorithms we refer to P. Charbonneau, 1995,Genetic algorithms in astronomy and astrophysics, ApJ Supl.Ser. 101, 309-334 (in particularthe first 9 pages).

An interesting extension is the use of black sheep, i.e. bad descendents from good parents,in the genetic algorithm scheme. This is discussed by A. Bobinger, 2000, Genetic eclipsemapping and the advantage of Black Sheep, A&A 357, 1170-1180

5.6 Exercises

Problem 1. Calculate the average and the variance of the Gauss function.Problem 2. Show that erf(x) = P (1/2, x2).Problem 3. x = a ln(±bu). Give the relation between σx and σu.Computer Problem 1. Write a fortran program which computes

a. the binomial probability for given k, n, p.b. the Poisson probability for given k, μ.c. the Gauss function for given x, μ, σ.To check your program, verify the following statements:a. if one throws 10 dice, the probability that precisely 7 three’s are up is 2.48 × 10−4.b. if the expected number of arriving counts is 15, the probability of measuring 30 counts is2.21 × 10−4, the probability of measuring 0 counts is 3.06 × 10−7.c. if one estimates the probabilities calculated in b) by approximating the Poisson functionwith a Gaussian, one finds 5.70 × 10−5 for both.

Computer Problem 2. Compute erf(x) for x = 1, 3, 5, 10.Computer Problem 3. Reproduce (the dots in) Figure??.Reading task 1. Read chapters 6.1 and 6.2 of Numerical Recipes with care, and have a

look at the fortran functions described in there.Problem 4. Radiopulsar PSRJ 1713 + 0747 has an observed period P = 4.5701 × 10−3 s

and period derivative P = 8.52×10−21, both with (for the purpose of this problem) negligibleerrors. Two components of its proper motion μ have been measured as μα = 4.9± 0.3mas/yrand μδ = −4.1 ± 1.0mas/yr; its distance is estimated from the dispersion measure as d =1.1 ± 0.3 kpc.

As noted by Shklovskii, the ratio P /P must be corrected for the apparent accelerationdue to the proper motion, so that the intrinsic ratio is:(

P

P

)i

=P

P+μ2d

c(5.70)

The (intrinsic) spindown age of a radio pulsar is

τc,i =(P

2P

)i

(5.71)

Calculate the intrinsic spindown age and its error.Computer Problem 4. Via computer you obtain a list of dates and logarithms of bolo-

metric fluxes for supernova SN1987A. The errors in logLb due to uncertainty in photometryare 0.003 until day 455, 0.005 until day 526, 0.02 until day 635, and 0.03 thereafter. Writea computer code which reads the file, and fits a straight line to the data between a first anda last day. Calculate d logLbol/dt and convert the result to a half-life time, for all data after

109

Figure 5.2: Lightcurve and fit of the bolometric magnitude of SN 1987A.

day 150; and for data between day 150 and 300. Compute the differences between the fit andthe measurements, and plot these for both fits as a funcion of time. What do you learn fromthese plots?

Computer Problem 5. Via computer you obtain a list with the ratios of 87Rb(t)/86Srand 87Sr(t)/86Sr, and associated errors, for 26 meteorites. Here t is the time since solidifi-cation of the parent asteroid from which the meteorites originate. The relation between thetwo ratios is given by

87Sr(t)86Sr

=87Sr(0)

86Sr+(eλt − 1

) 87Rb(t)86Sr

(5.72)

From the obervations we can determine t by inference the age of the solar system. There aresome subtleties, however:a. Which of the two quantities has the smallest errors? Rewrite Eq. 5.72 in a form suitablefor fitting a straight line.b. Fit a straight line to all points, and make a plot of the fit, and of the differences betweenfit and observations, in terms of χ ≡ (yi − ym)/σi Where i refers to the observation number,and m to the model fit.c. Why are some points far from the fit? Remove the worst points one by one, until you getan acceptable probability.

110

Computer Problem 6. Via computer you obtain a list with fluxes as a function ofwavelength for the Mg II 2800 line in a cataclysmic variable.a. Read the file, make a plot of flux as a function of wavelength, and fit a straight line to allthe data.b. Now write a subroutine for use in MRQMIN which computes the function values and theirderivatives, for a straight line plus a Gaussian line.c. Estimate initial values for the best parameters, and apply the routine from b) to make afit of the data.d. Compare the χ2 from a) and c) to decide whether the addition of the line improves the fitsignificantly.

111

Chapter 6

Looking for variability andperiodicity

This chapter provides a brief first guide on how one decides whether a time series of Nmeasurements yi with errors σi at times ti shows significant variability, and how one searchesfor periodicity in the variation. This chapter is mainly descriptive, and many results are givenwithout proof (but with appropriate references).

A first indication may be found by testing the hypothesis that the source is constant. Thebest guess for this constant is (in the case of Gaussian errors) given by Eq. 5.36. Computethe chi-squared for this average with Eq. 5.34, and the probability that this chi-squared isobtained by chance, from P (χ2

obs = gammq((N − 1)/2, χ2obs/2). If this probability is high, it

depends on the quality and length of the time series whether one stops here. To understandthis, we do a thought experiment with a series of N measurements yi, all drawn from aGaussian distribution around a constant value, all with the same error σ. Thus, the observedchi-squared is due to chance. Now, order the measurements, so that yN ≥ yN−1 ≥ . . . y2 ≥ y1.The time series thus obtained has the same chi-squared, but clearly cannot be obtained bychance: however, its significant increase of yi with time is not uncovered by the chi-squaredtest. As a second example, take the same series yi for the case that the measurements aremade at equidistant time intervals: ti = i × Δt. Now order the yi so that the higher valuesare assigned to ti with even i and the lower values to ti with odd i. Again, the significantperiodicity present in the re-ordered data is not uncovered by the chi-squared test. If thetime series is long enough, we can uncover the significant variability from other tests. In ourthought experiments, on could for example try a Kolmogorov-Smirnov test for the first case(see Sect. 6.3), and a Fourier transform for the second case (see Sect. 6.4).

In general, the level of sophistication that is appropriate increases when the measurementseries is larger and of higher quality.

6.1 Fitting sine-functions: Lomb-Scargle

As a first example, we look at the data obtained by the Hipparcos satellite on the pulsationalvariable W UMa (Fig. 6.1). It is clear from the range of variation as compared to the size ofthe error bars that this source is significantly variable. Due to the observing method of thesatellite, the data are taken at irregular intervals. For such data, one method to search forperiodicity is to fit a (co)sine curve:

Vh = a cos(ωt− φo) = A cosωt+B sinωt (6.1)

112

Figure 6.1: Magnitudes of eclipsing binaries WUMa (above; 117 points) and TVCas (below,133 points) as measured with Hipparcos, in order of observation date (left) and after foldingon the orbital period (right; the zero-phase and orbital period are indicated in days).

where A and B are related to a and φo as

a2 = A2 +B2; tan φo =B

A(6.2)

One can look for the best values for a, φo and ω ≡ 2π/P by minimizing the sum of thechi-squares. In principle, one could use one of the methods we discussed in the previouschapter, but a more specialized method, developed for this specific problem, is more efficient.First suggested by Lomb (1967), it was successively improved by Scargle (1982), Horne &Baliunas (1986), and Press & Rybicki (1989), and is described in Numerical Recipes, Ch. 13.8,which also gives the references to the original articles, and a subroutine which implements themethod.

113

In the case of W UMa, the folded lightcurve is roughly sinusoidal, and the Lomb-Scarglemethod (as it is commonly called) may be expected to work well.

In some cases, the lightcurve is (close to) a sine function with two maxima and two minimaat each orbital period. In such a case, it is sometimes difficult to decide whether the real periodis the one found, or twice as long. For example, contact binaries (also called W UMa variables)often have two almost equal minima per orbital period (Fig. 6.1). Depending on the qualityof the data one may or may not find a significant difference between the two minima. Insuch a case, additional knowledge (‘W UMa systems have two minima and two maxima ineach orbit’) must decide whether the period coming out of the Lomb-Scargle test is the actualperiod, or half of it.

6.2 Period-folding: Stellingwerf

In the case of TVCas, the folded light curve is very different from a sine. One thereforeexpects that the Lomb-Scargle method may not be optimally efficient in finding the period.Stellingwerf (1978, ApJ 224, 953) has developed a method which works for lightcurves ofarbitrary forms. The basic idea is as follows. Fold the data on a trial period to produce afolded lightcurve. Divide the folded lightcurve, in M bins. If the period is (almost) correct,the variance sj

2 inside each bin j ∈ 1,M is small; if the period is wrong, the variance ineach bin is almost the same as the total variance. The best period has the lowest value for∑Mj=1 sj

2.Stellingwerf discusses how many periods one should take, in which interval, and how many

bins, etc., as derived from the data set itself. In other cases, one may have other informationwhich may help in selecting a period range, for example when one knows that the star is aδCepheid variable with a known approximate absolute magnitude.

In general, both with the Lomb-Scargle and with the Stellingwerf method, one must ask thequestion what the probability is that the result obtained is due to chance. Analytic derivationof this probability is fraught with difficulties, and often the safest estimate is obtained bysimulations. For example, if one has N measurements yi at ti, one can scramble the dataand apply the Lomb-Scargle or Stellingwerf method. The scrambled data should not have aperiodicity. Therefore, by perfoming many scrambles, one can find out what the distributionof significances is that arises due to chance, and thereby what the probability is that theperiod obtained from the actual data is due to chance. This probability is often referred toas the false-alarm probability.

6.3 Variability through Kolmogorov-Smirnov tests

Data may be variable without strict periodicity. Consider a detector which is open for Tseconds and in this period detects N photons. If the interval is divided in M bins of equallength T/(M − 1), then for a constant source, each bin should contain n = N/(M − 1)photons, and one can do chi-squared or maximum-likelihood test that the observed numbersnj, (j ∈ 1,M) differ significantly from this. The problem with this approach is that thenumber of bins that one loses information by binning, and that the result depends on thenumber of bins chosen.

The Kolmogorov-Smirnov test (shorthand: KS-test) test avoids these problems. The KS-test in general computes the probability that two distributions are the same; or phraseddifferently, the probability that two distributions have been drawn from the same parentdistribution. The one-sided KS-test compares a theoretical distribution (which has no errors

114

Figure 6.2: Illustration of the Kolmogorov-Smirnov test for variability of an assumed con-stant source. The largest difference d betweenthe theoretical straight line and the actually ob-served histogram is a measure of the probabilitythat the variability is due to chance.

associated to it) with the observed distribution. The two-sided KS-test compared two observeddistributions, each of which has errors.

In a search for variability, the assumption for a constant source is that the expectednumber of photons detected (or the integrated flux received) increases linearly with time.Normalize the total number N of detected photons to 1. The theoretical expectation is thatthe normalized number of photons N(< t) that arrive before time t increases linearly witht from 0 at t = 0 to 1 at t = T . The observed distribution is a histogram which starts at0 for t = 0, and increases with 1/N at each time ti, i ∈ 1, N that a photon arrives. This isillustrated in Figure 6.2. Now determine the largest difference d between the theoretical curveand the observed curve. The KS-test gives the probability that a difference d or larger arisesin a sample of N photons due to chance. It takes into account that for large N one expectsany d arising due to chance to be smaller than in a small sample.

The Kolmogorov-Smirnov test is described in Numerical Recipes, Chapter 14.3, and sub-routines are given that implement it.

6.4 Fourier transforms

The Fourier transform uses the fact that a periodic signal builds up with time to discover aperiodic signal in a long time series, even if the signal is small with respect to the noise level.It works best in data taken in an un-interrupted series of equidistant intervals, and this isthe case we will discuss here. Our discussion is based on a tutorial by Michiel van der Klis.Interruptions in the time series, i.e. data gaps, lead to spurious periodicities, and techniqueshave been developed to remove these successively, for example ‘cleaning’. This is not dicussedhere.

Mathematically, we can discriminate between the continuous and the discrete Fouriertransforms. In observations, only the discrete transform occurs. To understand some of theproperties of the discrete transform it is useful to compare with the continuous transform.With i ≡ √−1, the continuous transform a(ν) of signal x(t) is given by

a(ν) =∫ ∞

−∞x(t)ei2πνtdt for −∞ < ν <∞ (6.3)

115

and the reverse transform

x(t) =∫ ∞

−∞a(ν)e−i2πνtdν for −∞ < t <∞ (6.4)

From this it can be derived that ∫ ∞

−∞x(t)2dt =

∫ ∞

−∞a(ν)2dν (6.5)

The above equations are occasionally written with the cyclic frequency ω ≡ 2πν.If we write ei2πνt as cos(2πνt) + i sin(2πνt) in Eq. 6.3, we see that the Fourier transform

gives the correlation between the time series x(t) and a sine (or cosine, see Eq. 6.1) function,in terms of amplitude and phase at each frequency ν.

6.4.1 The discrete Fourier transform

Now consider a series of measurements x(tk) ≡ xk taken at times tk where tk ≡ kT/N with Tthe total time during which the N measurements are made. The time step for the xk is δt =T/N . The discrete Fourier transform is defined atN frequencies νj, for j = −N/2, . . . , N/2−1,with a frequency step for the aj of δν = 1/T .

The discrete equivalents of the above equations 6.3, 6.4 and 6.5 are

aj =N−1∑k=0

xkei2πjk/N j = −N

2,−N

2+ 1, . . .

N

2− 2,

N

2− 1 (6.6)

xk =1N

N/2−1∑j=−N/2

aje−i2πjk/N k = 0, 1, 2, . . . , N − 1 (6.7)

andN−1∑k=0

|xk|2 =1N

N/2−1∑j=−N/2

|aj |2 (6.8)

The last equation is called Parseval’s theorem. These equations are occasionally also writtenin terms of cyclic frequencies ωj ≡ 2πνj .

The 1/N -term is located here on the right-hand side of Eq. 6.7. This is a matter ofconvention, and other conventions can be used, and in fact are used in the literature, e.g.putting the 1/N -term on the right hand side of Eq. 6.6, or putting a 1/

√N -term in both

Eqs. 6.6 and 6.7. When reading an article, first find out which convention is used!In the general definition of the Fourier transforms, both x and a are complex numbers. In

the application we discuss here, the xj are the numbers of photons detected at times tk, andare therefore real. This implies that the amplitude at frequency −j is the complex conjugateof the amplitude at frequency j, i.e. a−j = aj

∗, ensuring that the xk found from the complexaj through Eq. 6.7 are real again. The highest frequency that occurs is νN/2 = 0.5N/T ; it iscalled the Nyquist frequency. . Since a−N/2 = aN/2:

a−N/2 =N−1∑k=0

xke−iπk =

N−1∑k=0

xk(−1)k = aN/2 (6.9)

116

we may list the amplitude at the Nyquist frequency either at the positive or negative end ofthe series of aj. Again, it is a matter of convention. The amplitude at zero frequency is thetotal number of photons:

ao =N−1∑k=0

xk ≡ Ntot (6.10)

Parseval’s theorem Eq. 6.8 allows us to express the variance of the signal in terms of theFourier amplitudes aj :

N−1∑k=0

(xk − x)2 =N−1∑k=0

xk2 − 1

N

(N−1∑k=0

xk

)2

=1N

N/2−1∑j=−N/2

|aj |2 − 1Nao

2 (6.11)

The discrete Fourier transform converts N measurements xk into N/2 complex Fourieramplitudes aj = a−j∗. Each Fourier amplitude has two numbers associated with it, anamplitude and a phase, aj = |aj |eiφj . If the N measurements are uncorrelated, the N numbers(amplitudes and phases) associated with the N/2 Fourier amplitudes are uncorrelated as well.Mathematically, this follows from the following Equations (the derivation of which can bechecked with Problem 6.1):

N−1∑k=0

sinωjk = 0,N−1∑k=0

cosωjk = 0 (j �= 0) (6.12)

N−1∑k=0

cosωjk cosωmk =

⎧⎪⎨⎪⎩N/2, j = m �= 0 or N/2N, j = m = 0 orN/20, j �= m

(6.13)

N−1∑k=0

cosωjk sinωmk = 0 (6.14)

N−1∑k=0

sinωjk sinωmk =

{N/2, j = m �= 0 or N/20, otherwise

(6.15)

In searching for a periodic variation, the phase of the variation often is less importantthan the period. For this reason, a period search is often based on the power of the Fourieramplitudes, defined as a series of N/2 numbers, each number being given by

Pj ≡ 2ao

|aj |2 =2Ntot

|aj |2 j = 0, 1, 2, . . . ,N

2(6.16)

where we have used Eq. 6.10. The series Pj is called the power spectrum. By using the powerspectrum, we do not use all available information, in particular we do not use the informationon phases. The normalization of the power spectrum is by convention, and different authorsmay use different conventions. In deciding whether a periodicity that one find from the powerspectrum is significant, one should know which convention is used. The convention of Eq. 6.16is called the Leahy normalization.

Whereas the Fourier amplitudes aj follow the super-position theorem, the Fourier powerspectrum Pj does not. Specifically, if aj is the Fourier amplitude of xk and bj the Fourieramplitude of yk, then the Fourier amplitude cj of zk = xk + yk is given by cj = aj + bj; butthe power spectrum of zk is |cj |2 = |aj + bj|2 �= |aj |2 + |bj |2, the difference being due to the

117

correlation term ajbj. Only if the two signals xk and yk are not correlated, then the power ofthe combined signal may be approximated with the sum of the powers of the separate signals.

The variance given by Eq. 6.11 may be expressed in terms of the powers as

N−1∑k=0

(xk − x)2 =Ntot

N

⎛⎝N/2−1∑

j=1

Pj +12PN/2

⎞⎠ (6.17)

In characterizing the variation of a signal one also uses the fractional root-mean-squarevariation,

r ≡√

1N

∑k(xk − x)2

x=

√√√√∑N/2−1j=1 Pj + 0.5PN/2

Ntot(6.18)

6.4.2 From continuous to discrete

Consider a series of measurements xk taken in the time interval between t = 0 and t = T , atequidistant times tk. Mathematically, this series can be described in terms of a continuoustime series x(t), multiplied first with a window function

w(t) =

{1, 0 ≤ t < T0, otherwise

(6.19)

and then with a sampling function (’Dirac comb’)

s(t) =∞∑

k=−∞δ(t − kT

N) (6.20)

as illustrated in Figure 6.3.Let a(ν) be the continuous Fourier transform of x(t), and let W (ν) and S(ν) be the Fourier

transforms of w(t) and s(t), respectively:

|W (ν)|2 ≡∣∣∣∣∫ ∞

−∞w(t)e−i2πνtdt

∣∣∣∣2 =∣∣∣∣sin(πνT )

πν

∣∣∣∣2

= |T sinc(πνT )|2 (6.21)

i.e. the Fourier transform of a window function is (the absolute value of) a sinc-function, and

S(ν) =∫ ∞

−∞s(t)e−i2πνtdt =

N

T

∞∑m=−∞

δ

(ν −m

N

T

)(6.22)

i.e. the Fourier Transform of the Dirac comb is also a Dirac comb. Al these transforms aresymmetric around ν = 0 by definition.

Now we note that ‘The Fourier Transform of a product is the convolution of the FourierTransforms’ to understand how the discrete transform derives from the continuous one. Re-member, the convolution of a(ν) and b(ν) is

a(ν) ∗ b(ν) ≡∫ ∞

−∞a(ν ′)b(ν − ν ′)dν ′ (6.23)

First consider the product x(t)w(t). The effect of the window function w(t) on the Fouriertransform a(ν) of x(t) is to convolve each component in it with a sinc-function. For example,a δ function is turned into a sinc-function (see Fig. 6.3, where the power Pj is plotted). Thewidening dν is inversely proportional to the length of the time series: dν = 1/T .

118

Figure 6.3: A limited series of measurements equidistant in time can be considered as themultiplication of the underlying time series x(t) with a window function w(t) and a Diraccomb s(t) (left). The effect on the Fourier transform is first a widening of each componentby (the absolute value of) a sinc-function, and then an infinite repeating of the result. Thedashed line indicates the Nyquist frequency.

Next consider the product [x(t)w(t)]s(t). The multiplication of the signal by a Dirac combcorresponds to a convolution of its transform with a Dirac comb, i.e. by an infinite repeat ofthe convolution. This again is illustrated in Fig. 6.3.

Mathematically, the conversion from the continuous a(ν) to the discontinuous ad(ν) is asfollows:

ad(ν) ≡ a(ν) ∗W (ν) ∗ S(ν) =∫∞−∞ x(t)w(t)s(t)dt

=∫∞−∞ x(t)

∑N−1k=0 δ

(t− kT

N

)ei2πνtdt =

∑N−1k=0 x

(kTN

)ei2πνkT/N (6.24)

Thus, the finite length of the time series (windowing) causes a broadening of the Fouriertransform with a widht dν = 1/T with sidelobes, and the discreteness of the sampling causesthe aliasing, a reflection of periods beyond the Nyquist frequency into the range 0, νN/2.

In the above we have assumed that each sample is obtained instantaneously. In practice,the smaple is often an integration over a finite exposure time. This can be described math-ematically by convolving the time series x(t) with a window function with the width of the

119

exposure time:

b(t) =

{N/T, − T

2N < t < T2N

0, otherwise(6.25)

Since ‘The Fourier transfrom of a convolution is the product of the Fourier transfroms’, thisimplies that the Fourier transform ad(ν) is multiplied with the Fourier transform of b(t):

B(ν) =sinπνT/NπνT/N

(6.26)

At frequency zero, B(0) = 1, at the Nyquist frequency B(νN/2 = T/(2N)) = 2/π, andat double the Nyquist frequency B(ν = N/T ) = 0. Therefore, the frequencies beyond theNyquist frequency that are aliased into the window (0, νN/2) have a reduced amplitude. Wecan understand this: the integration of the exposure time corresponds to an averaging over atime interval T/N , and this reduces the variations at frequencies near N/T .

6.4.3 The statistics of power spectra

Now that we have the power spectrum, we want to know how to extract a signal from it. Weassume in out discussion that the time series x(t) consists of noise and a signal, and that noiseand signal are not correlated. We can then write:

Pj = Pj,noise + Pj,signal (6.27)

For many types of noise, the power Pj,noise approximately follows the chi-squared distributionwith 2 degrees of freedom. The normalization of the powers given by Eq. 6.16 ensures thepower of Poissonian noise is exactly distributed as the chi-squares with two degrees of freedom.Thus, the probability of finding a power Pj,noise larger than an observed value Pj is given by

Q(Pj) = gammq(0.5 ∗ 2, 0.5Pj) (6.28)

(see the discussion following Eq. 5.34).The standard deviation of the noise powers is equal to their mean value: σP = Pj = 2. In

other words, fairly high values of Pj are possible du to chance.In order to reduce the noise of the power spectrum, one may choose to average it. The

are two ways to do this. The first is to bin the power spectrum, i.e. take the average of everyW consecutive powers. The second is to divide the time series into M subseries, determinethe power spectrum for each subseries, and then average the power spectra. In both cases,the price that one pays is a loss of frequency resolution. The gain is that the power spectrumis now much less noisy. The chi-squared distribution of a power spectrum of a time serieswhich has been divided into M interval, and in which W successive powers in each spectrumare averaged, is given by the chi-squared distribution with 2MW degrees of freedom, scaledby 1/(MW ). The average of the distribution is 2, its variance 4/(MW ) and its standarddeviation 2/

√MW . In other words, the probability that a binned, averaged power is higher

than the observed power Pj,b is given by

Q(Pj,b = gammq(0.5[2MW ], 0.5[MWPj,b]) (6.29)

For sufficiently large MW this approaches the Gauss function.

120

6.4.4 Detecting and quantifying a signal

From Eq. 6.28 and 6.29 we can decide whether at a given frequency the observed signal exceedsthe noise level significantly, for any significance level we want to set. Thus, if we wish 90%significance, we first compute the Pj for which Q = 0.1, i.e. the probability which is exceededby chance in only 10% of the cases. Then we check whether the observed power is bigger thanthis Pj .

The above reasoning is true is we decide on the frequency before we do the statistics, i.e.if we first select one single frequency νj . Such is the case if we wish to know whether a sourceis significantly variable on a known period, for example the orbital period of a binary, or thepulse period for a pulsar.

In general, we are searching for a period, i.e. we do not know which frequency is important,and thus we apply Eq. 6.28 and 6.29 many times, viz. once for each frequency. This correspondsto many trials, and thus our probability level has te be set accordingly. To see how, firstconsider one frequency. Suppose that the a value Pdetect has a probability 1− ε′ not to be dueto chance. Now try Ntrials frequencies. The probability that the value Pdetect is not due tochance at any of these frequencies is given by (1−ε′)Ntrials , which for small ε′ equals 1−Ntrialsε

′.Thus, the probability that the value Pdetect is due to chance at any of these frequencies isgiven by ε = Ntrialsε

′. In conclusion, if we wish to set an overall chance of ε, we must take thechance per trial as ε′ = ε/Ntrials, i.e.

ε′ =ε

Ntrials= gammq(0.5[2MW ], 0.5[MWPdetect ]) (6.30)

Suppose we have an observed power Pj,b which is higher than the detection power Pdetect

for given chance ε′. Because the observed power is the sum of the noise power and the signalpower (Eq. 6.27), this implies

Pj,signal > Pj,b − Pj,noise (1 − ε′) confidence (6.31)

Suppose on the other hand that we find no observed power at any frequency to exceed thedetection level. We may then be interested in setting an upper limit to the signal power. Todo this, we proceed as follows. First we determine the level Pexceed which is exceeded by thenoise alone with a high probability (1 − δ), from

1 − δ = gammq(0.5[2MW ], 0.5[MWPexceed ]) (6.32)

If the highest actually observed power in the range of interest is Pmax, then the upper limitPUL to the power is

PUL = Pmax − Pexceed (6.33)

in the sense that, if there were a signal power higher than PUL, the highest observed powerwould be higher than Pmax with a (1 − δ) probability.

6.5 Exercises

Problem 4. Derive Eq. 6.12 to 6.15 we use the following intermediate steps.a. Show that

N−1∑k=0

eiλk =eiλN − 1eiλ − 1

= eiλ(N−1)/2 eiλN/2 − e−iλN/2

eiλ/2 − eiλ/2(6.34)

121

b. Now use Euler’s equation eiλ = sinλ+ i cos λ and its inverse relations

cosλ =12

(eiλ + e−iλ

), sinλ =

12i

(eiλ + e−iλ

)(6.35)

to show

N−1∑k=0

cos(λk) = cos[λ(N − 1)/2]sin(λN/2)sin(λ/2)

,N−1∑k=0

sin(λk) = sin[λ(N − 1)/2]sin(λN/2)sin(λ/2)

(6.36)

c. From Euler’s equations we also have

sin(λ+ μ) = sinλ cosμ+ cos λ sinμ cos(λ+ μ) = cos λ cosμ− sinλ sinμ (6.37)

and the inverse relations

cos λ cosμ =12

[cos(λ+ μ) + cos(λ− μ)] (6.38)

cos λ sinμ =12

[sin(λ+ μ) − sin(λ− μ)] (6.39)

sinλ sinμ =12

[cos(λ− μ) − cos(λ+ μ)] (6.40)

Problem 5. Consider a symmetric window function w(t) which has values w(t) = 1 for−T/2 < t < T/2 and w(t) = 0 otherwise, and show that its Fouriertransform is a sincfunction.Computer problem 6. Via computer you obtain a list of 65536 points (16 per line) withsimulated X-ray data from a low-mass X-ray binary. Each number gives the number of photonsdetected during one millisecond. There are no gaps in the data.a. write a little computer program which reads the first 8 numbers from this list, and calculatesthe Fourier transform.b. now use the routine realft(data, n, isign) from Numerical Recipes to compute the Fouriertransform of the same 8 points, and show that you obtain the same result. As input, theroutine realft requires an array data with the data points, the integer n which gives thenumber of data points, and the integer isign which tells it whether to perform a forward or abackward Fourier transformation. As output it gives data(1) as the total number of photons,which equals a0, data(2) is an/2, and for the remaining components of the transform we havedata(2j + 1) = Re{aj} and data(2j + 2) = Im{ai}. Show that you obtain the same result aswith your own code of a.c. Now compute the Fourier transform of the whole data set, and compute and plot the powerspectrum.d. We want to see whether the highest peak observed in the power spectrum is significant.To do this, determine the detection level of 90% and 99% probability. Take account of thenumber of trials!e. Reduce the noise in the power spectrum by averaging in bins of 16 powers. Do you notenew features?

122

Observational Astrophysics II - SRONkuiper/hea_inst11/handout_4nov2011/OAF2… · 1.1 Stochastic...

Documents

Transcript of Observational Astrophysics II - SRONkuiper/hea_inst11/handout_4nov2011/OAF2… · 1.1 Stochastic...