Post on 17-Dec-2014
description
Using Principal Component Analysis to Remove Correlated
Signal from Astronomical Images
Kim ScottNational Radio Astronomy Observatory
Data Science Meet-upFebruary 18, 2014
Galaxy Evolution in One Slide...
Galaxy Evolution in One Slide...
Galaxy Evolution in One Slide...
?
Galaxy Surveys – What Are We Missing?
Galaxy Surveys – What Are We Missing?
Optical surveysare biased
Optical surveys miss ~50% of star formation in galaxies
Dust reemits stellar radiation at infrared to millimeter wavelengths (λ ~ 20 – 2000 μm)
Galaxy Surveys at (Sub)mm Wavelengths
Extragalactic emission:Transmitted
Absorbed
Atmospheric emission1000× stronger than signal from galaxies
Removing the Atmosphere by Modulating the Signal in Time
Galaxy
Detector array
Removing the Atmosphere by Modulating the Signal in Time
xij: power measured for
time sample i on detector j
i = 3
i = 2i = 1
Galaxy
Detector array
Surveys at λ=1.1mm with AzTEC
ASTE Telescope
AzTEC DewarAzTEC Array
(117 detectors)
Raw Time-stream Data
Sample rate = 1∕(15.625 ms)
Raw Time-stream Data
Sample rate = 1∕(15.625 ms)(20 s = 1280 samples)
Principal Component Analysis (PCA)
[Used in supervised learning to compress data - fit to fewer number of features]
• xij: power measured for time sample i on detector j• n = number of detectors; m = number of time samples• X = [ x1 x2 ... xm ] → n × m matrix
*Only input needed for PCA*
Principal Component Analysis (PCA)
Step 1: Mean normalization (and feature scaling)
• Compute μj = (1∕m) Σi=1,m xij for each detector• Compute σ2
j = (1∕(m-1)) Σi=1,m (xij - μj)2 for each detector• Set xij ⇒ (xij − μj) ∕ σj• X = [ x1 x2 ... xm ] → n × m matrix
Principal Component Analysis (PCA)
Step 1: Mean normalization (and feature scaling)
• Compute μj = (1∕m) Σi=1,m xij for each detector• Compute σ2
j = (1∕(m-1)) Σi=1,m (xij - μj)2 for each detector• Set xij ⇒ (xij − μj) ∕ σj• X = [ x1 x2 ... xm ] → n × m matrix
Principal Component Analysis (PCA)
Step 1: Mean normalization (and feature scaling)
• Compute μj = (1∕m) Σi=1,m xij for each detector• Compute σ2
j = (1∕(m-1)) Σi=1,m (xij - μj)2 for each detector• Set xij ⇒ (xij − μj) ∕ σj• X = [ x1 x2 ... xm ] → n × m matrix
*PCA can identify lower levelcorrelations among subsets of the detectors*
1mV
Principal Component Analysis (PCA)
Step 2: Calculate covariance matrix
• C = (1∕m) X XT (recall m = # time samples)• C → n × n symmetric matrix (recall n = 117 detectors)
Step 3: Eigen decomposition
• C = Q Λ Q-1 (*solve using SVD*)• Q = [ q1 q2 ... qn ] → n × n matrix containing eigenvectors qi
•Λ → n × n diagonal matrix containing eigenvalues λi = Λii• Principal components = uncorrelated variables
Principal Component Analysis (PCA)
Step 4: Choose number of components to remove
• Goal: choose fewest number of components (k) to REMOVE most of the observed variance in the data
• QR = [ qk+1 qk+2 ... qn ] → n × k matrix, k < n• Z = [ z1 z2 ... zm ] = QRT X → k x m matrix• To derive model of galaxy intensities on sky, use Z instead of X (but...)
Choosing k:
Variance after PCA (given k)Variance with average subtraction only
< 0.05
Principal Component Analysis (PCA)
Step 5: Reconstruct data without correlated signal
• Know RA/Dec for each detector: need to reconstruct approximation for data to make image
• XR = QR Z → n × m matrix with correlated signal removed!
1mV
Principal Component Analysis (PCA)
Step 5: Reconstruct data without correlated signal
• Know RA/Dec for each detector: need to reconstruct approximation for data to make image
• XR = QR Z → n × m matrix with correlated signal removed!
20μV
*Variance reduced by factor of 50*
Image of PKS J1127-1857Make the map:
• Use information on sky position for each detector at each time sample (RAi
j, Decij) and bin data onto image grid
• Set the intensity of each image pixel to the average of the xRij values
that fall into that bin• Smooth image by telescope point-spread response function
(Gaussian with FWHM=30’’)
PCA CleanedAverage Subtraction
• raw data = 30 MB• ttot = 4 min• 16640 samples/detector
An Extragalactic Survey at λ=1.1 mm
• Most galaxies are 100× fainter than PKS J1127-1857
• raw data ~ 25 GB• ttot ~ 80 hrs• ~ 2×107 samples/detector
• AzTEC/COSMOS survey• 0.7 deg2
• 500× area of HUDF• 160 hrs versus 11 days for HUDF• 130 mm-bright galaxies
Aretxaga et al. 2011
An Extragalactic Survey at λ=1.1 mm
• AzTEC/COSMOS survey• 0.7 deg2
• 500× area of HUDF• 160 hrs versus 270 hrs for HUDF• 130 mm-bright galaxies
An Extragalactic Survey at λ=1.1 mm
• AzTEC/COSMOS survey• 0.7 deg2
• 500× area of HUDF• 160 hrs versus 270 hrs for HUDF• 130 mm-bright galaxies
An Extragalactic Survey at λ=1.1 mm
• AzTEC/COSMOS survey• 0.7 deg2
• 500× area of HUDF• 160 hrs versus 270 hrs for HUDF• 130 mm-bright galaxies
Aretxaga et al. 2011
Capak et al. 2011
• AzTEC-3• Observed 1 Gyr after Big Bang• Starburst galaxy (SFR~1000 Msun/yr)