Signal Modeling, Statistical Inference and Data Mining in...

A6523 Signal Modeling, Statistical Inference and

Data Mining in Astrophysics

Spring 2015 http://www.astro.cornell.edu/~cordes/A6523

Lecture 15 – Red noise, leakage, missing information – Ad hoc approaches – New paradigms:

•  Entropy and information •  Autoregressive processes and maximum entropy

spectral estimation

Red Noise: Challenges for Spectral Estimation

arX

iv:0

910.

2706

v1 [

astro

-ph.

HE]

14

Oct

200

9Mon. Not. R. Astron. Soc. 000, 1–15 (2009) Printed 14 October 2009 (MN LATEX style file v2.2)

A Bayesian test for periodic signals in red noise

S. Vaughan!

X-Ray and Observational Astronomy Group, University of Leicester, Leicester, LE1 7RH, U.K.

Accepted 2009 October 14. Received 2009 October 14; in original form 2009 September 11

ABSTRACTMany astrophysical sources, especially compact accreting sources, show strong, randombrightness fluctuations with broad power spectra in addition to periodic or quasi-periodicoscillations (QPOs) that have narrower spectra. The random nature of the dominant source ofvariance greatly complicates the process of searching for possible weak periodic signals. Wehave addressed this problem using the tools of Bayesian statistics; in particular using Markovchain Monte Carlo techniques to approximate the posterior distribution of model parame-ters, and posterior predictive model checking to assess model fits and search for periodogramoutliers that may represent periodic signals. The methods developed are applied to two ex-ample datasets, both long XMM-Newton observations of highly variable Seyfert 1 galaxies:RE J1034 + 396 and Mrk 766. In both cases a bend (or break) in the power spectrum is evi-dent. In the case of RE J1034 + 396 the previously reported QPO is found but with somewhatweaker statistical significance than reported in previous analyses. The difference is due partlyto the improved continuum modelling, better treatment of nuisance parameters, and partly todifferent data selection methods.Key words: Methods: statistical – Methods: data analysis – X-rays: general – Galaxies:Seyfert

1 INTRODUCTION

A perennial problem in observational astrophysics is detecting pe-riodic or almost-periodic signals in noisy time series. The stan-dard analysis tool is the periodogram (see e.g. Jenkins & Watts1969; Priestley 1981; Press et al. 1992; Bloomfield 2000; Chatfield2003), and the problem of period detection amounts to assessingwhether or not some particular peak in the periodogram is due to aperiodic component or a random fluctuation in the noise spectrum(see Fisher 1929; Priestley 1981; Leahy et al. 1983; van der Klis1989; Percival & Walden 1993; Bloomfield 2000).

If the time series is the sum of a random (stochastic) compo-nent and a periodic one we may write y(t) = yR(t) + yP (t) and,due to the independence of yR(t) and yP (t), the power spectrum ofy(t) is the sum of the two power spectra of the random and stochas-tic processes: SY (f) = SR(f) +SP (f). This is a mixed spectrum(Percival & Walden 1993, section 4.4) formed from the sum ofSP (f), which comprises only narrow features, and SR(f), whichis a continuous, broad spectral function. Likewise, we may con-sider an evenly sampled, finite time series y(ti) (i = 1, 2, . . . , N )as the sum of two finite time series: one is a realisation of the pe-riodic process, the other a random realisation of the stochastic pro-cess. We may compute the periodogram (which is an estimator ofthe true power spectrum) from the squared modulus of the Dis-crete Fourier Transform (DFT) of the time series, and, as with thepower spectra, the periodograms of the two processes add linearly:

! E-mail: [email protected]

I(fj) = IR(fj) + IP (fj). The periodogram of the periodic timeseries will contain only narrow “lines” with all the power concen-trated in only a few frequencies, whereas the periodogram of thestochastic time series will show power spread over many frequen-cies. Unfortunately the periodogram of stochastic processes fluc-tuates wildly around the true power spectrum, making it difficultto distinguish random fluctuations in the noise spectrum from trulyspectral periodic components. See van der Klis (1989) for a thor-ough review of these issues in the context of X-ray astronomy.

Particular attention has been given to the special case thatthe spectrum of the stochastic process is flat (a white noise spec-trum S(f) = const), which is the case when the time series datayR(ti) are independently and identically distributed (IID) randomvariables. Reasonably well-established statistical procedures havebeen developed to help identify spurious spectral peaks and reducethe chance of false detections (e.g. Fisher 1929; Priestley 1981;Leahy et al. 1983; van der Klis 1989; Percival & Walden 1993). Incontrast there is no comparably well-established procedure in thegeneral case that the spectrum of the stochastic process is not flat.

In a previous paper, Vaughan (2005) (henceforth V05), weproposed what is essentially a generalisation of Fisher’s method tothe case where the noise spectrum is a power law: SR(f) = !f!"

(where " and ! are the power law index and normalisation param-eters). Processes with power spectra that show a power law depen-dence on frequency with " > 0 (i.e. increasing power to lowerfrequencies) are called red noise and are extremely common in as-tronomy and elsewhere (see Press 1978). In this paper we expandupon the ideas in V05 and, in particular, address the problem from a

Presents a nice summary of Bayesian methodology + comparison with periodograms. Applies to :me series with red noise + periodicity but does not deal with spectral leakage for steep spectra.

arX

iv:1

209.

3154

v2 [

astro

-ph.

EP]

22 N

ov 2

012

Mon. Not. R. Astron. Soc. 000, 1–17 (2012) Printed 26 November 2012 (MN LATEX style file v2.2)

The impact of red noise in radial velocity planet searches:Only three planets orbiting GJ581?

Roman V. Baluev!

Central (Pulkovo) Astronomical Observatory of Russian Academy of Sciences, Pulkovskoje sh. 65/1, St Petersburg 196140, RussiaSobolev Astronomical Institute, St Petersburg State University, Universitetskij prospekt 28, Petrodvorets, St Petersburg 198504, Russia

Received 2012 November 21; in original form 2012 September 14

ABSTRACT

We perform a detailed analysis of the latest HARPS and Keck radial velocity data forthe planet-hosting red dwarf GJ581, which attracted a lot of attention in recent time.We show that these data contain important correlated noise component (“red noise”)with the correlation timescale of the order of 10 days. This red noise imposes a lotof misleading e!ects while we work in the traditional white-noise model. To eliminatethese misleading e!ects, we propose a maximum-likelihood algorithm equipped by anextended model of the noise structure. We treat the red noise as a Gaussian randomprocess with exponentially decaying correlation function.

Using this method we prove that: (i) planets b and c do exist in this system, sincethey can be independently detected in the HARPS and Keck data, and regardless ofthe assumed noise models; (ii) planet e can also be confirmed independently by theboth datasets, although to reveal it in the Keck data it is mandatory to take the rednoise into account; (iii) the recently announced putative planets f and g are likely justillusions of the red noise; (iv) the reality of the planet candidate GJ581 d is question-able, because it cannot be detected from the Keck data, and its statistical significancein the HARPS data (as well as in the combined dataset) drops to a marginal level of! 2", when the red noise is taken into account.

Therefore, the current data for GJ581 really support existence of no more thanfour (or maybe even only three) orbiting exoplanets. The planet candidate GJ581 drequests serious observational verification.

Key words: planetary systems - stars: individual: GJ581 - techniques: radial veloc-ities - methods: data analysis - methods: statistical - surveys

1 INTRODUCTION

The multi-planet extrasolar system hosted by the red dwarfGJ581 has attracted a lot of interest in the past few years.The concise history of planet detections for this system is asfollows. The first planet b, having orbital period of 5.37 d andminimum mass of ! 15M!, was reported by Bonfils et al.(2005). The two subsequent super-Earths c (with the or-bital period of 12.9 d and the minimum mass of ! 5M!)and d (with the originally reported orbital period of 82 dlater corrected to 67 d and current minimum mass estimateof 6M!) were discovered by Udry et al. (2007). Further,Mayor et al. (2009) reported the detection of the smallestexoplanet known so far, GJ581 e, orbiting the host star each3.15 d and having the minimum mass of only appoximately2M!. All these discoveries were done on the basis of the

! E-mail: [email protected]

radial velocity data obtained with the famous HARPS spec-trograph.

Later, the Keck planet-search team got involved.Vogt et al. (2010) performed an analysis of the combinedHARPS and Keck measurements and claimed the detectionof two more planets in the system, f and g, orbiting the hoststar each 433 d and 36.6 d (and having minimum masses of! 7 and ! 3M!). The last planet g is remarkable becauseit appears to reside in the middle of the predicted habitablezone for this star. However, the reality of these two planetsrepresents a subject of serious debates in the recent time.Gregory (2011) remained uncertain about the existence ofthese planets, based on his very detailed Bayesian analysis ofthe joint (Mayor et al. 2009) and (Vogt et al. 2010) datasets.Tadeu dos Santos et al. (2012) basically agreed with thisconclusion, finding from the same combined dataset that thedetection confidence probabilties for these two planets are96% for planet g and 98% for planet f. These values are toohigh to be just neglected, but they are simultaneously too

Detec:on of periodic signals in radial velocity :me series (explanet signatures) His use of ‘red noise’ is a bit of a misnomer: it is correlated noise but not noise with a power-‐law spectrum ~ f-‐n

Simulating power-law noise •  Applications: many phenomena in nature are

processes with spectra that are power-law in form (temporally or spatially)

•  Can use a linear filter with impulse response

h(t):

•  Let x(t) = realization of white noise and H(f) = sqrt(shape) of spectrum that is wanted

S(f) ∝ f−α f0 ≤ f ≤ f1

x(t) −→ h(t) −→ y(t)

y(t) = x(t) ∗ h(t)

Y (f) = X(f)× H(f)

�|Y (f)|2� = �|X(f)|2�|H(f)|2

Procedure •  Generate spectrum in frequency domain •  Generate white noise realizations for real and

imaginary parts of H(f) –  random number generator: gaussian, uniform

•  Multiply by f-α/2 for f0 ≤ f ≤ f1

•  Fill vector to force Hermiticity •  Do inverse DFT to get real time-domain

realization of the noise •  Can be applied to any shape of spectrum of

course •  What will the statistics be in the time domain?

f-3 spectrum

0 200 400 600 800 1000n

0.00.20.40.60.81.01.21.4

Dat

a

Spectral index = 3.0 uniform sampling

100 101 102k

10−3

10−2

10−1

100101102

Spe

ctru

m=|FFT|2

0 200 400 600 800 1000n

0.00.20.40.60.81.01.21.4

Dat

a

Spectral index = 3.0 gaps

0 200 400 600 800 1000n

0.00.20.40.60.81.01.21.4

Dat

a

Spectral index = 3.0 gaps + zerofill

100 101 102k

10−410−310−210−1100101102

Spe

ctru

m=|FFT|2

0 200 400 600 800 1000n

0.00.20.40.60.81.01.21.4

Dat

a

Spectral index = 3.0 before end matching

100 101 102k

10−3

10−2

10−1

100101102

Spe

ctru

m=|FFT|2

0 200 400 600 800 1000n

0.00.20.40.60.81.01.21.4

Dat

a

Spectral index = 3.0 with slope lines

100 101 102k

10−710−610−510−410−310−210−1100101102

Spe

ctru

m=|FFT|2

Slope = -‐3

Slope = -‐2

Underlying Issue The time series is periodic à discontinuities

0 200 400 600 800 1000n

0.00.20.40.60.81.01.21.4

Dat

a


100 101 102k

10−710−610−510−410−310−210−1100101102

Spe

ctru

m=|FFT|2

0 200 400 600 800 1000n

0.00.20.40.60.81.01.21.4

Dat

a


100 101 102k

10−710−610−510−410−310−210−1100101102

Spe

ctru

m=|FFT|2

0 200 400 600 800 1000n

0.00.20.40.60.81.01.21.4

Dat

a


100 101 102k

10−710−610−510−410−310−210−1100101102

Spe

ctru

m=|FFT|2

… …

Discon:nui:es ê

Gibbs phenomenon ê

Leakage ê

Bias in the spectral es:mate

Ad hoc fix = “End Matching”

0

200400

600800

1000

n

0.00.20.40.60.81.01.21.4

Dat

a

Spectral index = 3.0with slope lines

10 0

10 1

10 2k

10−710−610−510−410−310−210−110 010 110 2

Spec

trum

=|FFT|2

… … 0

200400

600800

1000

n

0.00.20.40.60.81.01.21.4

Dat

a


10 0

10 1

10 2k

10−710−610−510−410−310−210−110 010 110 2

Spec

trum

=|FFT|2

0

200400

600800

1000

n

0.00.20.40.60.81.01.21.4

Dat

a


10 0

10 1

10 2k

10−710−610−510−410−310−210−110 010 110 2

Spec

trum

=|FFT|2

0 200 400 600 800 1000n

0.00.20.40.60.81.01.21.4

Dat

a


100 101 102k

10−710−610−510−410−310−210−1100101102

Spe

ctru

m=|FFT|2

Subtract line that connects the end points. This makes the periodically extended 4me series con4nuous

0 200 400 600 800 1000n

0.00.20.40.60.81.01.21.4

Dat

a

Spectral index = 3.0 end-match line

100 101 102k

10−810−710−610−510−410−310−210−1100101102

Spe

ctru

m=|FFT|2

0 200 400 600 800 1000n

0.00.20.40.60.81.01.21.4

Dat

a

Spectral index = 3.0 after end matching

100 101 102k

10−810−710−610−510−410−310−210−1100

Spe

ctru

m=|FFT|2

f-5 spectrum

0 200 400 600 800 1000n

−0.3−0.2−0.10.00.10.20.3

Dat

a


100 101 102k

10−3

10−2

10−1

100

101

Spe

ctru

m=|FFT|2

0 200 400 600 800 1000n

−0.3−0.2−0.10.00.10.20.3

Dat

a


0 200 400 600 800 1000n

−0.3−0.2−0.10.00.10.20.3

Dat

a


100 101 102k

10−710−610−510−410−310−210−1100101

Spe

ctru

m=|FFT|2

0 200 400 600 800 1000n

−0.3−0.2−0.10.00.10.20.3

Dat

a


100 101 102k

10−3

10−2

10−1

100

101

Spe

ctru

m=|FFT|2

0 200 400 600 800 1000n

−0.3−0.2−0.10.00.10.20.3

Dat

a


100 101 102k

10−1310−1110−910−710−510−310−1101

Spe

ctru

m=|FFT|2

0 200 400 600 800 1000n

−0.3−0.2−0.10.00.10.20.3

Dat

a


100 101 102k

10−3

10−2

10−1

100

101

Spe

ctru

m=|FFT|2

0 200 400 600 800 1000n

−0.3−0.2−0.10.00.10.20.3

Dat

a


100 101 102k

10−1010−910−810−710−610−510−410−310−210−1100

Spe

ctru

m=|FFT|2

Slope = -‐5 Slope = -‐2

à End matching not working so well

f-6 spectrum

0 200 400 600 800 1000n

−0.02

0.00

0.02

0.04

0.06

Dat

a


100 101 102k

10−6

10−5

10−4

10−3

10−2

10−1

Spe

ctru

m=|FFT|2

0 200 400 600 800 1000n

−0.02

0.00

0.02

0.04

0.06

Dat

a


0 200 400 600 800 1000n

−0.02

0.00

0.02

0.04

0.06

Dat

a


100 101 102k

10−810−710−610−510−410−310−210−1

Spe

ctru

m=|FFT|2

0 200 400 600 800 1000n

−0.02

0.00

0.02

0.04

0.06

Dat

a


100 101 102k

10−6

10−5

10−4

10−3

10−2

10−1

Spe

ctru

m=|FFT|2

0 200 400 600 800 1000n

−0.02

0.00

0.02

0.04

0.06D

ata


100 101 102k

10−1710−1510−1310−1110−910−710−510−310−1

Spe

ctru

m=|FFT|2

0 200 400 600 800 1000n

−0.08−0.06−0.04−0.020.000.020.040.06

Dat

a


100 101 102k

10−6

10−5

10−4

10−3

10−2

10−1

Spe

ctru

m=|FFT|2

Slope = -‐2

Slope = -‐6

0 200 400 600 800 1000n

−0.08−0.06−0.04−0.020.000.020.040.06

Dat

a


100 101 102k

10−1210−1110−1010−910−810−710−610−510−410−310−210−1100

Spe

ctru

m=|FFT|2

Slope = -‐2

Slope = -‐6

End matching is failing!

Why is end-matching failing?

•  Clue: end-matching is progressively worse for steeper spectra

Why is end-matching failing?

•  Clue: end-matching is progressively worse for steeper spectra

•  Answer: Though end matching makes the time series continuous (no discontinuities) it does not prevent discontinuities in the derivatives of the time series. As the spectrum gets steeper, the higher order derivatives become more important

Another ad hoc fix

•  Tapering of the time series •  E.g. a Gaussian function centered on the

middle of the time series will make the time series and its derivatives continuous (to the extent that the Gaussian has fallen off at the edges of [0, T]

(Sketch on board)

Cyclical Data Other workarounds

•  Convolution of a short filter function with a long time series (too long to use a single FFT)

•  Cross correlation of two long time series

Cyclical vs Non-cyclical Convolution & Correlation

From Bergland, ”A Guided Tour of the FFT”, IEEE Spectrum, 1969, 6 (Bell Labs)

Noncyclical Procedures

Aspects of Spectral Analysis

The outcome of any attempt to estimate the power spectrum should be assessed through con-sideration of the following:

(1) confidence intervals(2) resolution(3) handling of gaps (missing data)(4) bias (particularly for processes with steep power laws)

Numbers (2) and (3) are related because resolution is determined by the length of the data setT and one can view data after the interval [0, T ] as missing data.

14

Linear/fixed window/Finite F.T. methods vs. Data Adaptive Methods

Spectral estimators based on the Fourier transform have the properties:

no smoothing

1. frequency resolution : δf = T−1 T = data span2. estimation error : ε = [Var S]1/2/�S(ω)� = 1

with smoothing

3.tradeoff between ε and δf :spectral window W ⇒ ε decreases, δf increases,sidelobes altered ε = (2/Nd.o.f.)1/2

4. Since Rw(τ ) ⇔ Sw(ω), the estimator is linear in the correlation function estimates, Rw.

5. Also, once chosen by the analyst, the window function is fixed with respect to the data.

Therefore, even though the spectral estimation may be iterative whereby the analyst chooses by

trial an optimum window, this optimization is performed according to intuitive/aesthetic/prior

knowledge prejudice. The form of the window is not an integral part of the estimation process.

Hence, the characterization of finite F.T. estimators as linear, fixed window, empirical estima-

tors.

15

Linear/fixed window FT methods vs. Data adaptive methods

Different Views on Missing Data in a Time Series

1. Transform (in the general sense) data using fixed window analyses; then correct the data for

the effects of gaps in data, etc. (e.g. can apply a CLEAN algorithm to the spectrum).

vs.

2. Transform the data without making assumptions about missing data (maximum entropy

techniques are “maximally noncommittal” about missing data).

‘Empirical methods such as Blackman-Tukey, which do not invoke even a likelihood function,

are useful in the preliminary, exploratory phase of a problem where our knowledge is sufficient

to permit intuitive judgments about how to organize a calculation (smoothing, decimation, win-

dows, prewhitening, padding with zeroes, etc.) but insufficient to set up a quantitative model

which would do the proper things for us automatically and optimally.’ [In abstract of ”On the

Rationale of Maximum Entropy Methods”, E.T. Jaynes, 1982, IEEE Proc., 70, 939.]

16

Signal Modeling, Statistical Inference and Data Mining in...

Documents

Transcript of Signal Modeling, Statistical Inference and Data Mining in...