Use of ambiguous detections to improve ... - Olivier Gimenez
Signal Modeling, Statistical Inference and Data Mining in...
Transcript of Signal Modeling, Statistical Inference and Data Mining in...
A6523 Signal Modeling, Statistical Inference and
Data Mining in Astrophysics
Spring 2015 http://www.astro.cornell.edu/~cordes/A6523
Lecture 15 – Red noise, leakage, missing information – Ad hoc approaches – New paradigms:
• Entropy and information • Autoregressive processes and maximum entropy
spectral estimation
Red Noise: Challenges for Spectral Estimation
arX
iv:0
910.
2706
v1 [
astro
-ph.
HE]
14
Oct
200
9Mon. Not. R. Astron. Soc. 000, 1–15 (2009) Printed 14 October 2009 (MN LATEX style file v2.2)
A Bayesian test for periodic signals in red noise
S. Vaughan!
X-Ray and Observational Astronomy Group, University of Leicester, Leicester, LE1 7RH, U.K.
Accepted 2009 October 14. Received 2009 October 14; in original form 2009 September 11
ABSTRACTMany astrophysical sources, especially compact accreting sources, show strong, randombrightness fluctuations with broad power spectra in addition to periodic or quasi-periodicoscillations (QPOs) that have narrower spectra. The random nature of the dominant source ofvariance greatly complicates the process of searching for possible weak periodic signals. Wehave addressed this problem using the tools of Bayesian statistics; in particular using Markovchain Monte Carlo techniques to approximate the posterior distribution of model parame-ters, and posterior predictive model checking to assess model fits and search for periodogramoutliers that may represent periodic signals. The methods developed are applied to two ex-ample datasets, both long XMM-Newton observations of highly variable Seyfert 1 galaxies:RE J1034 + 396 and Mrk 766. In both cases a bend (or break) in the power spectrum is evi-dent. In the case of RE J1034 + 396 the previously reported QPO is found but with somewhatweaker statistical significance than reported in previous analyses. The difference is due partlyto the improved continuum modelling, better treatment of nuisance parameters, and partly todifferent data selection methods.Key words: Methods: statistical – Methods: data analysis – X-rays: general – Galaxies:Seyfert
1 INTRODUCTION
A perennial problem in observational astrophysics is detecting pe-riodic or almost-periodic signals in noisy time series. The stan-dard analysis tool is the periodogram (see e.g. Jenkins & Watts1969; Priestley 1981; Press et al. 1992; Bloomfield 2000; Chatfield2003), and the problem of period detection amounts to assessingwhether or not some particular peak in the periodogram is due to aperiodic component or a random fluctuation in the noise spectrum(see Fisher 1929; Priestley 1981; Leahy et al. 1983; van der Klis1989; Percival & Walden 1993; Bloomfield 2000).
If the time series is the sum of a random (stochastic) compo-nent and a periodic one we may write y(t) = yR(t) + yP (t) and,due to the independence of yR(t) and yP (t), the power spectrum ofy(t) is the sum of the two power spectra of the random and stochas-tic processes: SY (f) = SR(f) +SP (f). This is a mixed spectrum(Percival & Walden 1993, section 4.4) formed from the sum ofSP (f), which comprises only narrow features, and SR(f), whichis a continuous, broad spectral function. Likewise, we may con-sider an evenly sampled, finite time series y(ti) (i = 1, 2, . . . , N )as the sum of two finite time series: one is a realisation of the pe-riodic process, the other a random realisation of the stochastic pro-cess. We may compute the periodogram (which is an estimator ofthe true power spectrum) from the squared modulus of the Dis-crete Fourier Transform (DFT) of the time series, and, as with thepower spectra, the periodograms of the two processes add linearly:
! E-mail: [email protected]
I(fj) = IR(fj) + IP (fj). The periodogram of the periodic timeseries will contain only narrow “lines” with all the power concen-trated in only a few frequencies, whereas the periodogram of thestochastic time series will show power spread over many frequen-cies. Unfortunately the periodogram of stochastic processes fluc-tuates wildly around the true power spectrum, making it difficultto distinguish random fluctuations in the noise spectrum from trulyspectral periodic components. See van der Klis (1989) for a thor-ough review of these issues in the context of X-ray astronomy.
Particular attention has been given to the special case thatthe spectrum of the stochastic process is flat (a white noise spec-trum S(f) = const), which is the case when the time series datayR(ti) are independently and identically distributed (IID) randomvariables. Reasonably well-established statistical procedures havebeen developed to help identify spurious spectral peaks and reducethe chance of false detections (e.g. Fisher 1929; Priestley 1981;Leahy et al. 1983; van der Klis 1989; Percival & Walden 1993). Incontrast there is no comparably well-established procedure in thegeneral case that the spectrum of the stochastic process is not flat.
In a previous paper, Vaughan (2005) (henceforth V05), weproposed what is essentially a generalisation of Fisher’s method tothe case where the noise spectrum is a power law: SR(f) = !f!"
(where " and ! are the power law index and normalisation param-eters). Processes with power spectra that show a power law depen-dence on frequency with " > 0 (i.e. increasing power to lowerfrequencies) are called red noise and are extremely common in as-tronomy and elsewhere (see Press 1978). In this paper we expandupon the ideas in V05 and, in particular, address the problem from a
Presents a nice summary of Bayesian methodology + comparison with periodograms. Applies to :me series with red noise + periodicity but does not deal with spectral leakage for steep spectra.
arX
iv:1
209.
3154
v2 [
astro
-ph.
EP]
22 N
ov 2
012
Mon. Not. R. Astron. Soc. 000, 1–17 (2012) Printed 26 November 2012 (MN LATEX style file v2.2)
The impact of red noise in radial velocity planet searches:Only three planets orbiting GJ581?
Roman V. Baluev!
Central (Pulkovo) Astronomical Observatory of Russian Academy of Sciences, Pulkovskoje sh. 65/1, St Petersburg 196140, RussiaSobolev Astronomical Institute, St Petersburg State University, Universitetskij prospekt 28, Petrodvorets, St Petersburg 198504, Russia
Received 2012 November 21; in original form 2012 September 14
ABSTRACT
We perform a detailed analysis of the latest HARPS and Keck radial velocity data forthe planet-hosting red dwarf GJ581, which attracted a lot of attention in recent time.We show that these data contain important correlated noise component (“red noise”)with the correlation timescale of the order of 10 days. This red noise imposes a lotof misleading e!ects while we work in the traditional white-noise model. To eliminatethese misleading e!ects, we propose a maximum-likelihood algorithm equipped by anextended model of the noise structure. We treat the red noise as a Gaussian randomprocess with exponentially decaying correlation function.
Using this method we prove that: (i) planets b and c do exist in this system, sincethey can be independently detected in the HARPS and Keck data, and regardless ofthe assumed noise models; (ii) planet e can also be confirmed independently by theboth datasets, although to reveal it in the Keck data it is mandatory to take the rednoise into account; (iii) the recently announced putative planets f and g are likely justillusions of the red noise; (iv) the reality of the planet candidate GJ581 d is question-able, because it cannot be detected from the Keck data, and its statistical significancein the HARPS data (as well as in the combined dataset) drops to a marginal level of! 2", when the red noise is taken into account.
Therefore, the current data for GJ581 really support existence of no more thanfour (or maybe even only three) orbiting exoplanets. The planet candidate GJ581 drequests serious observational verification.
Key words: planetary systems - stars: individual: GJ581 - techniques: radial veloc-ities - methods: data analysis - methods: statistical - surveys
1 INTRODUCTION
The multi-planet extrasolar system hosted by the red dwarfGJ581 has attracted a lot of interest in the past few years.The concise history of planet detections for this system is asfollows. The first planet b, having orbital period of 5.37 d andminimum mass of ! 15M!, was reported by Bonfils et al.(2005). The two subsequent super-Earths c (with the or-bital period of 12.9 d and the minimum mass of ! 5M!)and d (with the originally reported orbital period of 82 dlater corrected to 67 d and current minimum mass estimateof 6M!) were discovered by Udry et al. (2007). Further,Mayor et al. (2009) reported the detection of the smallestexoplanet known so far, GJ581 e, orbiting the host star each3.15 d and having the minimum mass of only appoximately2M!. All these discoveries were done on the basis of the
! E-mail: [email protected]
radial velocity data obtained with the famous HARPS spec-trograph.
Later, the Keck planet-search team got involved.Vogt et al. (2010) performed an analysis of the combinedHARPS and Keck measurements and claimed the detectionof two more planets in the system, f and g, orbiting the hoststar each 433 d and 36.6 d (and having minimum masses of! 7 and ! 3M!). The last planet g is remarkable becauseit appears to reside in the middle of the predicted habitablezone for this star. However, the reality of these two planetsrepresents a subject of serious debates in the recent time.Gregory (2011) remained uncertain about the existence ofthese planets, based on his very detailed Bayesian analysis ofthe joint (Mayor et al. 2009) and (Vogt et al. 2010) datasets.Tadeu dos Santos et al. (2012) basically agreed with thisconclusion, finding from the same combined dataset that thedetection confidence probabilties for these two planets are96% for planet g and 98% for planet f. These values are toohigh to be just neglected, but they are simultaneously too
Detec:on of periodic signals in radial velocity :me series (explanet signatures) His use of ‘red noise’ is a bit of a misnomer: it is correlated noise but not noise with a power-‐law spectrum ~ f-‐n
Simulating power-law noise • Applications: many phenomena in nature are
processes with spectra that are power-law in form (temporally or spatially)
• Can use a linear filter with impulse response
h(t):
• Let x(t) = realization of white noise and H(f) = sqrt(shape) of spectrum that is wanted
S(f) ∝ f−α f0 ≤ f ≤ f1
x(t) −→ h(t) −→ y(t)
y(t) = x(t) ∗ h(t)
Y (f) = X(f)× H(f)
�|Y (f)|2� = �|X(f)|2�|H(f)|2
Procedure • Generate spectrum in frequency domain • Generate white noise realizations for real and
imaginary parts of H(f) – random number generator: gaussian, uniform
• Multiply by f-α/2 for f0 ≤ f ≤ f1
• Fill vector to force Hermiticity • Do inverse DFT to get real time-domain
realization of the noise • Can be applied to any shape of spectrum of
course • What will the statistics be in the time domain?
f-3 spectrum
0 200 400 600 800 1000n
0.00.20.40.60.81.01.21.4
Dat
a
Spectral index = 3.0 uniform sampling
100 101 102k
10−3
10−2
10−1
100101102
Spe
ctru
m=|FFT|2
0 200 400 600 800 1000n
0.00.20.40.60.81.01.21.4
Dat
a
Spectral index = 3.0 gaps
0 200 400 600 800 1000n
0.00.20.40.60.81.01.21.4
Dat
a
Spectral index = 3.0 gaps + zerofill
100 101 102k
10−410−310−210−1100101102
Spe
ctru
m=|FFT|2
0 200 400 600 800 1000n
0.00.20.40.60.81.01.21.4
Dat
a
Spectral index = 3.0 before end matching
100 101 102k
10−3
10−2
10−1
100101102
Spe
ctru
m=|FFT|2
0 200 400 600 800 1000n
0.00.20.40.60.81.01.21.4
Dat
a
Spectral index = 3.0 with slope lines
100 101 102k
10−710−610−510−410−310−210−1100101102
Spe
ctru
m=|FFT|2
Slope = -‐3
Slope = -‐2
Underlying Issue The time series is periodic à discontinuities
0 200 400 600 800 1000n
0.00.20.40.60.81.01.21.4
Dat
a
Spectral index = 3.0 with slope lines
100 101 102k
10−710−610−510−410−310−210−1100101102
Spe
ctru
m=|FFT|2
0 200 400 600 800 1000n
0.00.20.40.60.81.01.21.4
Dat
a
Spectral index = 3.0 with slope lines
100 101 102k
10−710−610−510−410−310−210−1100101102
Spe
ctru
m=|FFT|2
0 200 400 600 800 1000n
0.00.20.40.60.81.01.21.4
Dat
a
Spectral index = 3.0 with slope lines
100 101 102k
10−710−610−510−410−310−210−1100101102
Spe
ctru
m=|FFT|2
… …
Discon:nui:es ê
Gibbs phenomenon ê
Leakage ê
Bias in the spectral es:mate
Ad hoc fix = “End Matching”
0
200400
600800
1000
n
0.00.20.40.60.81.01.21.4
Dat
a
Spectral index = 3.0with slope lines
10 0
10 1
10 2k
10−710−610−510−410−310−210−110 010 110 2
Spec
trum
=|FFT|2
… … 0
200400
600800
1000
n
0.00.20.40.60.81.01.21.4
Dat
a
Spectral index = 3.0with slope lines
10 0
10 1
10 2k
10−710−610−510−410−310−210−110 010 110 2
Spec
trum
=|FFT|2
0
200400
600800
1000
n
0.00.20.40.60.81.01.21.4
Dat
a
Spectral index = 3.0with slope lines
10 0
10 1
10 2k
10−710−610−510−410−310−210−110 010 110 2
Spec
trum
=|FFT|2
0 200 400 600 800 1000n
0.00.20.40.60.81.01.21.4
Dat
a
Spectral index = 3.0 with slope lines
100 101 102k
10−710−610−510−410−310−210−1100101102
Spe
ctru
m=|FFT|2
Subtract line that connects the end points. This makes the periodically extended 4me series con4nuous
0 200 400 600 800 1000n
0.00.20.40.60.81.01.21.4
Dat
a
Spectral index = 3.0 end-match line
100 101 102k
10−810−710−610−510−410−310−210−1100101102
Spe
ctru
m=|FFT|2
0 200 400 600 800 1000n
0.00.20.40.60.81.01.21.4
Dat
a
Spectral index = 3.0 after end matching
100 101 102k
10−810−710−610−510−410−310−210−1100
Spe
ctru
m=|FFT|2
f-5 spectrum
0 200 400 600 800 1000n
−0.3−0.2−0.10.00.10.20.3
Dat
a
Spectral index = 5.0 uniform sampling
100 101 102k
10−3
10−2
10−1
100
101
Spe
ctru
m=|FFT|2
0 200 400 600 800 1000n
−0.3−0.2−0.10.00.10.20.3
Dat
a
Spectral index = 5.0 gaps
0 200 400 600 800 1000n
−0.3−0.2−0.10.00.10.20.3
Dat
a
Spectral index = 5.0 gaps + zerofill
100 101 102k
10−710−610−510−410−310−210−1100101
Spe
ctru
m=|FFT|2
0 200 400 600 800 1000n
−0.3−0.2−0.10.00.10.20.3
Dat
a
Spectral index = 5.0 before end matching
100 101 102k
10−3
10−2
10−1
100
101
Spe
ctru
m=|FFT|2
0 200 400 600 800 1000n
−0.3−0.2−0.10.00.10.20.3
Dat
a
Spectral index = 5.0 with slope lines
100 101 102k
10−1310−1110−910−710−510−310−1101
Spe
ctru
m=|FFT|2
0 200 400 600 800 1000n
−0.3−0.2−0.10.00.10.20.3
Dat
a
Spectral index = 5.0 end-match line
100 101 102k
10−3
10−2
10−1
100
101
Spe
ctru
m=|FFT|2
0 200 400 600 800 1000n
−0.3−0.2−0.10.00.10.20.3
Dat
a
Spectral index = 5.0 after end matching
100 101 102k
10−1010−910−810−710−610−510−410−310−210−1100
Spe
ctru
m=|FFT|2
Slope = -‐5 Slope = -‐2
à End matching not working so well
f-6 spectrum
0 200 400 600 800 1000n
−0.02
0.00
0.02
0.04
0.06
Dat
a
Spectral index = 6.0 uniform sampling
100 101 102k
10−6
10−5
10−4
10−3
10−2
10−1
Spe
ctru
m=|FFT|2
0 200 400 600 800 1000n
−0.02
0.00
0.02
0.04
0.06
Dat
a
Spectral index = 6.0 gaps
0 200 400 600 800 1000n
−0.02
0.00
0.02
0.04
0.06
Dat
a
Spectral index = 6.0 gaps + zerofill
100 101 102k
10−810−710−610−510−410−310−210−1
Spe
ctru
m=|FFT|2
0 200 400 600 800 1000n
−0.02
0.00
0.02
0.04
0.06
Dat
a
Spectral index = 6.0 before end matching
100 101 102k
10−6
10−5
10−4
10−3
10−2
10−1
Spe
ctru
m=|FFT|2
0 200 400 600 800 1000n
−0.02
0.00
0.02
0.04
0.06D
ata
Spectral index = 6.0 with slope lines
100 101 102k
10−1710−1510−1310−1110−910−710−510−310−1
Spe
ctru
m=|FFT|2
0 200 400 600 800 1000n
−0.08−0.06−0.04−0.020.000.020.040.06
Dat
a
Spectral index = 6.0 end-match line
100 101 102k
10−6
10−5
10−4
10−3
10−2
10−1
Spe
ctru
m=|FFT|2
Slope = -‐2
Slope = -‐6
0 200 400 600 800 1000n
−0.08−0.06−0.04−0.020.000.020.040.06
Dat
a
Spectral index = 6.0 after end matching
100 101 102k
10−1210−1110−1010−910−810−710−610−510−410−310−210−1100
Spe
ctru
m=|FFT|2
Slope = -‐2
Slope = -‐6
End matching is failing!
Why is end-matching failing?
• Clue: end-matching is progressively worse for steeper spectra
Why is end-matching failing?
• Clue: end-matching is progressively worse for steeper spectra
• Answer: Though end matching makes the time series continuous (no discontinuities) it does not prevent discontinuities in the derivatives of the time series. As the spectrum gets steeper, the higher order derivatives become more important
Another ad hoc fix
• Tapering of the time series • E.g. a Gaussian function centered on the
middle of the time series will make the time series and its derivatives continuous (to the extent that the Gaussian has fallen off at the edges of [0, T]
(Sketch on board)
Cyclical Data Other workarounds
• Convolution of a short filter function with a long time series (too long to use a single FFT)
• Cross correlation of two long time series
Cyclical vs Non-cyclical Convolution & Correlation
From Bergland, ”A Guided Tour of the FFT”, IEEE Spectrum, 1969, 6 (Bell Labs)
Noncyclical Procedures
Aspects of Spectral Analysis
The outcome of any attempt to estimate the power spectrum should be assessed through con-sideration of the following:
(1) confidence intervals(2) resolution(3) handling of gaps (missing data)(4) bias (particularly for processes with steep power laws)
Numbers (2) and (3) are related because resolution is determined by the length of the data setT and one can view data after the interval [0, T ] as missing data.
14
Linear/fixed window/Finite F.T. methods vs. Data Adaptive Methods
Spectral estimators based on the Fourier transform have the properties:
no smoothing
1. frequency resolution : δf = T−1 T = data span2. estimation error : ε = [Var S]1/2/�S(ω)� = 1
with smoothing
3.tradeoff between ε and δf :spectral window W ⇒ ε decreases, δf increases,sidelobes altered ε = (2/Nd.o.f.)1/2
4. Since Rw(τ ) ⇔ Sw(ω), the estimator is linear in the correlation function estimates, Rw.
5. Also, once chosen by the analyst, the window function is fixed with respect to the data.
Therefore, even though the spectral estimation may be iterative whereby the analyst chooses by
trial an optimum window, this optimization is performed according to intuitive/aesthetic/prior
knowledge prejudice. The form of the window is not an integral part of the estimation process.
Hence, the characterization of finite F.T. estimators as linear, fixed window, empirical estima-
tors.
15
Linear/fixed window FT methods vs. Data adaptive methods
Different Views on Missing Data in a Time Series
1. Transform (in the general sense) data using fixed window analyses; then correct the data for
the effects of gaps in data, etc. (e.g. can apply a CLEAN algorithm to the spectrum).
vs.
2. Transform the data without making assumptions about missing data (maximum entropy
techniques are “maximally noncommittal” about missing data).
‘Empirical methods such as Blackman-Tukey, which do not invoke even a likelihood function,
are useful in the preliminary, exploratory phase of a problem where our knowledge is sufficient
to permit intuitive judgments about how to organize a calculation (smoothing, decimation, win-
dows, prewhitening, padding with zeroes, etc.) but insufficient to set up a quantitative model
which would do the proper things for us automatically and optimally.’ [In abstract of ”On the
Rationale of Maximum Entropy Methods”, E.T. Jaynes, 1982, IEEE Proc., 70, 939.]
16