Exploratory*Analysis*of*LightCurves*€¦ · LBV Asteroids AGN Rotation Eclipse Microlensing...
Transcript of Exploratory*Analysis*of*LightCurves*€¦ · LBV Asteroids AGN Rotation Eclipse Microlensing...
Exploratory Analysis of Light Curves
G. Jogesh Babu Ashish Mahabol SaeNa Park
• For centuries, astronomers have created taxonomies for all types celesCal populaCons including comets, asteroids, stars, galaxies, acCve galacCc nuclei, and supernovae.
• UnCl recently, nearly all classificaCons were based on heurisCc and subjecCve procedures.
• Most stellar classificaCons were based on colors and spectral properCes.
• Supernovae have a complicated classificaCon scheme: Type I with subtypes Ia, Ib, Ic, Ib/c pec and Type II with subtypes IIb, IIL, IIP and Iin.
• Rarely have staCsCcal or algorithmic procedures been used to define the classes.
Overview
• What we want to do – Classify lightcurves of mostly non-‐variable sources – Look for interesCng aspects in the outliers
• What are the steps we are taking – Clustering – Exploring feature space – FuncConal Data Analysis – Machine Learning
Clustering versus ClassificaCon • ClassificaCon: – There are M known populaCons – There is a sample from each populaCon – These samples are the training data, so this is supervised learning – The training data is used to develop a classifier for objects whose populaCon membership is unknown.
• Clustering: – The goal is to parCCon the data into M groups – Groups are not defined a priori – No training data, so unsupervised learning – There is no “best” M
Most groups are aVer transients (low hanging fruit)
• For every transient, there are 10^6 non-‐transients • But there is variability at all levels • We are trying to make the most of the non-‐transients
LBV
AGNAsteroids
RotationEclipse
Microlensing Eruptive PulsationSecular
(DAV) H-WDs
Variability Tree
NovaeN
SymbioticZAND
Dwarf novae
UG
Eclipse
Asteroid occultation
Eclipsing binary
Planetary transits
EA
EB
EW
Rotation
ZZ CetiPG 1159
Solar-like
(PG1716+426, Betsy)long period sdB
V1093 Her
(W Vir)Type II Ceph.δ Cepheids
RR Lyrae
CW
Credit : L. Eyer & N. Mowlavi (03/2009)
(updated 04/2013) δ Scuti
γ Doradus
Slowlypulsating B stars
α Cygni
β Cephei
λ Eri
SX Phoenicis
Hot OB Supergiants
ACYG
BCEP
SPBe
GDOR
DST
PMSδ Scuti
roAp
Miras
Irregulars
Semi-regulars
M
SRL
RV
SARVSmall ampl. red var.
(DO,V GW Vir)He/C/O-WDs
PV TelHe star
Be stars
RCB
GCASFU
UV Ceti
Binary red giants
α2 Canes VenaticorumMS (B8-A7) withstrong B fields
SX ArietisMS (B0-A7) withstrong B fields
Red dwarfs(K-M stars)
ACV
BY Dra
ELL
FKCOMSingle red giants
WR
SXA
β Per, α Vir
RS CVn
PMS
S Dor
Eclipse
(DBV) He-WDs
V777 Her
(EC14026)short period sdB
V361 Hya
RV Tau
Photom. Period.FG SgeSakurai,V605 Aql
R Hya (Miras)δ Cep (Cepheid)
DY Per
Supernovae
SN II, Ib, IcSN Ia
Extrinsic
Radio quiet Radio loud
Seyfert I
Seyfert 2
LINER
RLQ
BLRG
NLRG
WLRG
RQQ
OVVBL Lac
Blazar
Stars Stars
Intrinsic
CEPRR
SXPHESPB
Cataclysmic
Characterize/Classify as much as possible all types of objects
We concentrate here on lightcurves (;me series)
Current sample is from Catalina RealCme Transient Survey (CRTS)
500 M lightcurves available for analysis. We chose a few hundred for the exploratory work.
CRTS lightcurves
• Regions with a radius of 3’ have been chosen with centers at RA=(100,200,300), Dec = (-‐30,-‐20,...,50)
• File naming: crtslc_200_p10.csv etc. for the region centered on RA = 200 deg, Dec = 10 deg
• Included fields: MasterID, Mag, Magerr, RA, Dec, MJD (in days), Blend (flag indicaCng possible confusion).
Set of objects around random locaCons (mostly non-‐variable
Look at the random regions
• Next few slides show how these random regions look • Though we use CRTS data, we deliberately chose SDSS cutouts so that we can also highlight how the same region can look very different in different surveys, something that will be crucial when federaCng varied datasets.
• The RA/Dec can be seen on the leV panels. • Circles indicate photometric objects, squares indicate objects with spectra.
• The informaCon will be useful in characterizing outliers independently.
Mul;ple epochs from CRTS (8 years of data)
• The large symbols somewhat exaggerate any possible moCons here
• An advantage of picking sets of objects near each other is that they have roughly the same epochs of observaCon, thus allowing for registraCon when needed
Registra;on of lightcurves is possible. Gaps indicate missing data (upper limits can be assumed)
A zoom in to indicate that each column in previous plot is actually 4 columns just 10 minutes apart (x-‐axis, MJD is in days)
Derived staCsCcs
• StaCsCcs for each object in each region is available. Note: many calculaCons done using fluxes (linear) instead of magnitudes (log).
• A few discriminaCng stats are: amplitude, linear trend, median buffer range, standard deviaCon, beyond1std.
• Lightcurves with less than 5 points were ignored • Stats based on Richards et al. 2011 and calculated using the Caltech Time Series CharacterizaCon Service: hip://nirgun.caltech.edu:8000/scripts/descripCon.html
• Amplitude: Half the difference between the maximum and minimum magnitudes
• Beyond 1 std: Percentage of points beyond one standard deviaCon from the weighted mean
• Flux percenCle raCo (90 -‐ 10 : 95 – 5)[mid80]: RaCo of flux percenCles (90th -‐ 10th) over (95th -‐ 5th)
• Linear Trend: Slope of a linear fit to the light curve • Maximum slope: Maximum absolute flux slope between two consecuCve observaCons
• Median buffer range percentage: Percentage of fluxes within 10% of the amplitude from the median
Lightcurves
• Mostly constant • Some variables – Some rapid – Some slow – Some periodic
• Errors can vary as a funcCon of Cme
Examples from CRTS
●
●●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●●●●
●
●●●●
●
●
●●
●●●
●●●
●
●●
●
●●●
●
●●
●●
●
●●●●
●●
●
●
●
●●● ●●●●●
●
●
●
●
●
●
●
●
●
●
●●
●●●
●
●
●●
●
●
●●●●●●
●●
●
●
54000 54500 55000 55500 56000
2019
1817
16
ID=1021068061186
MJD
Mag
●
●●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●●●●
●
●●●●
●
●
●●
●●●
●●●
●
●●
●
●●●
●
●●
●●
●
●●●●
●●
●
●
●
●●● ●●●●●
●
●
●
●
●
●
●
●
●
●
●●
●●●
●
●
●●
●
●
●●●●●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●●●●
●
●
●●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●●●
●●
●
●
●
●●● ●●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●●
●
●
●
●
●
●●●
●
●
●
●
54000 54500 55000 55500 56000
18.5
18.0
17.5
17.0
16.5
16.0
ID=1021068061186
MJD
Mag
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●●●●
●
●
●●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●●●
●●
●
●
●
●●● ●●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●●
●
●
●
●
●
●●●
●
●
●
●
Examples from CRTS
●
●
●
●
●●
●●
●
●
●
●
●●
●●
●
●●●
●
●
●
●●
●
●●
●●
●●
●
●
●
●
●
●
●
●
●●●●
●
●
●
●
●●
●
●●
●●●●
●●
●
●
●
●
●●
●
●●
●
●
●
●
●●
●
●
●●
●●●
●●
●
●
●
●
●
●
●
●●
●
●●
●●
54000 54500 55000 55500 56000
2221
2019
1817
ID=1021068061378
MJD
Mag
●
●
●
●
●●
●●
●
●
●
●
●●
●●
●
●●●
●
●
●
●●
●
●●
●●
●●
●
●
●
●
●
●
●
●
●●●●
●
●
●
●
●●
●
●●
●●●●
●●
●
●
●
●
●●
●
●●
●
●
●
●
●●
●
●
●●
●●●
●●
●
●
●
●
●
●
●
●●
●
●●
●●
●
●
●
●
●●
●●
●
●
●
●
●
●
●●
●
●●●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
54000 54500 55000 55500 56000
20.0
19.5
19.0
18.5
18.0
ID=1021068061378
MJD
Mag
●
●
●
●
●●
●●
●
●
●
●
●
●
●●
●
●●●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
Examples from CRTS
●●
●
●●●
●●●
●
●●●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●●
● ●●
●
●
●
●
●●
● ●● ●
●●
●
●
●
●
●
●
● ●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●●
53500 54000 54500 55000 55500
2322
2120
1918
17
ID=2020262030178
MJD
Mag
●●
●
●●●
●●●
●
●●●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●●
● ●●
●
●
●
●
●●
● ●● ●
●●
●
●
●
●
●
●
● ●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●●
●●
●
●
●●
●●●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●●●
●●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
53500 54000 54500 55000 55500
2221
2019
ID=2020262030178
MJD
Mag
●●
●
●
●●
●●●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●●●
●●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
Examples from CRTS
●
●
●
●
●
●
●●●●
●
●
●
●
●
●●
●
●
●●●●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
● ●●●
●
●
●●
●
●
●
●
●
●● ●●●
●●●●● ●
●
●
●
●●●●
●●
●
●
●
● ●●●●
●
●●
●
53500 54000 54500 55000 55500
2322
2120
1918
17
ID=2020262030217
MJD
Mag ●
●
●
●
●
●
●●●●
●
●
●
●
●
●●
●
●
●●●●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
● ●●●
●
●
●●
●
●
●
●
●
●● ●●●
●●●●● ●
●
●
●
●●●●
●●
●
●
●
● ●●●●
●
●●
●
●
●
●
●
●
●
●●●●
●
●
●
●
●
●●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●●●
●
●
●●
●
●
●
●
●
●● ●
●●
●●●
●● ●●
●
●
●●
●●
●●
●
●
●
● ●
●●●
●
●●
●
53500 54000 54500 55000 55500
2120
1918
ID=2020262030217
MJD
Mag
●
●
●
●
●
●
●●●●
●
●
●
●
●
●●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●●●
●
●
●●
●
●
●
●
●
●● ●
●●
●●●
●● ●●
●
●
●●
●●
●●
●
●
●
● ●
●●●
●
●●
●
Examples from CRTS
●
●
●●
●●●●
●●
●
●
●
●
●
●●
●●
●
●●
●●●
●
●
●
●
●●
●
●
●
●●●
●
●
●
●
●● ●
●
●
●
●
●●●
●
●●●●
●
●
●●
●
●
●●
●
●
●
●●
●●●●
●●
●
●
●
● ●
●
●●
●
●
●
●
53500 54000 54500 55000 55500
2322
2120
1918
17
ID=2020262030567
MJD
Mag
●
●
●●
●●●●
●●
●
●
●
●
●
●●
●●
●
●●
●●●
●
●
●
●
●●
●
●
●
●●●
●
●
●
●
●● ●
●
●
●
●
●●●
●
●●●●
●
●
●●
●
●
●●
●
●
●
●●
●●●●
●●
●
●
●
● ●
●
●●
●
●
●
●
●
●
●
●
●●●●
●
●
●
●
●
●
●
●●
●●
●
●
●
●●●
●
●
●
●
●●
●
●
●
●●●
●
●
●
●
●
●●
●
●
●
●
●●●
●
●●●●
●
●
●●
●
●
●●
●
●
●
●●
●
●●●
●●
●
●
●
●●
●
●●
●
●
●
●
53500 54000 54500 55000 55500
2120
1918
ID=2020262030567
MJD
Mag
●
●
●
●
●●●●
●
●
●
●
●
●
●
●●
●●
●
●
●
●●●
●
●
●
●
●●
●
●
●
●●●
●
●
●
●
●
●●
●
●
●
●
●●●
●
●●●●
●
●
●●
●
●
●●
●
●
●
●●
●
●●●
●●
●
●
●
●●
●
●●
●
●
●
●
Clustering and outliers using 6 most significant variables
Movie using 6 significant variables
Outliers
●
outliers non−outliers
0.0
0.5
1.0
1.5
2.0
2.5
amplitude
outliers non−outliers
0.1
0.2
0.3
0.4
0.5
beyond1std
●
outliers non−outliers
0.00
0.10
0.20
0.30
fpr_mid20
●
●
outliers non−outliers
0.0
0.1
0.2
0.3
0.4
0.5
fpr_mid35
DistribuCon (boxplots) for outliers (and non-‐variables)
FuncConal Data Analysis Staicu et al. test
• Suppose we want to test H0: µ(t) © µ, that is, that the meanfunction is constant
• for light curves, we are testing whether or not a star is variable
• Staicu, Li, Crainicanu, and Ruppert (2012) develop likelihoodratio tests about the mean of functional data
• The challenge is to take account of the correlation byestimating the correlation function
• The test can be applied to dense or sparsely observedfunctional data
• More general null hypotheses• µ(t) is a polynomial of degree p
• The means of two samples of functional data are equal
Conclusions • CRTS provides a rich data set of 500 M lightcurves for
exploraCon. • So far mostly transients have been looked at • We are devising ways to explore all lightcurves with
addiConal informaCon available due to proximity (e.g. ability to register)
• ExisCng parameters have redundancies which we plan to eliminate through clustering as well as dimensional reducCon techniques
• Future plans: connect CRTS to brighter surveys like DASCH (overlap for a small fracCon of sources, but large Cme sample), and with fainter surveys like LSST (starCng with simulaCons to get ready for the actual survey). [DASCH -‐ Digital Access to a Sky Century @ Harvard]