Using Principal Component Analysis to Remove Correlated Signal from Astronomical Images

25
Using Principal Component Analysis to Remove Correlated Signal from Astronomical Images Kim Scott National Radio Astronomy Observatory Data Science Meet-up February 18, 2014

description

 

Transcript of Using Principal Component Analysis to Remove Correlated Signal from Astronomical Images

Page 1: Using Principal Component Analysis to Remove Correlated Signal from Astronomical Images

Using Principal Component Analysis to Remove Correlated

Signal from Astronomical Images

Kim ScottNational Radio Astronomy Observatory

Data Science Meet-upFebruary 18, 2014

Page 2: Using Principal Component Analysis to Remove Correlated Signal from Astronomical Images

Galaxy Evolution in One Slide...

Page 3: Using Principal Component Analysis to Remove Correlated Signal from Astronomical Images

Galaxy Evolution in One Slide...

Page 4: Using Principal Component Analysis to Remove Correlated Signal from Astronomical Images

Galaxy Evolution in One Slide...

?

Page 5: Using Principal Component Analysis to Remove Correlated Signal from Astronomical Images

Galaxy Surveys – What Are We Missing?

Page 6: Using Principal Component Analysis to Remove Correlated Signal from Astronomical Images

Galaxy Surveys – What Are We Missing?

Optical surveysare biased

Optical surveys miss ~50% of star formation in galaxies

Dust reemits stellar radiation at infrared to millimeter wavelengths (λ ~ 20 – 2000 μm)

Page 7: Using Principal Component Analysis to Remove Correlated Signal from Astronomical Images

Galaxy Surveys at (Sub)mm Wavelengths

Extragalactic emission:Transmitted

Absorbed

Atmospheric emission1000× stronger than signal from galaxies

Page 8: Using Principal Component Analysis to Remove Correlated Signal from Astronomical Images

Removing the Atmosphere by Modulating the Signal in Time

Galaxy

Detector array

Page 9: Using Principal Component Analysis to Remove Correlated Signal from Astronomical Images

Removing the Atmosphere by Modulating the Signal in Time

xij: power measured for

time sample i on detector j

i = 3

i = 2i = 1

Galaxy

Detector array

Page 10: Using Principal Component Analysis to Remove Correlated Signal from Astronomical Images

Surveys at λ=1.1mm with AzTEC

ASTE Telescope

AzTEC DewarAzTEC Array

(117 detectors)

Page 11: Using Principal Component Analysis to Remove Correlated Signal from Astronomical Images

Raw Time-stream Data

Sample rate = 1∕(15.625 ms)

Page 12: Using Principal Component Analysis to Remove Correlated Signal from Astronomical Images

Raw Time-stream Data

Sample rate = 1∕(15.625 ms)(20 s = 1280 samples)

Page 13: Using Principal Component Analysis to Remove Correlated Signal from Astronomical Images

Principal Component Analysis (PCA)

[Used in supervised learning to compress data - fit to fewer number of features]

• xij: power measured for time sample i on detector j• n = number of detectors; m = number of time samples• X = [ x1 x2 ... xm ] → n × m matrix

*Only input needed for PCA*

Page 14: Using Principal Component Analysis to Remove Correlated Signal from Astronomical Images

Principal Component Analysis (PCA)

Step 1: Mean normalization (and feature scaling)

• Compute μj = (1∕m) Σi=1,m xij for each detector• Compute σ2

j = (1∕(m-1)) Σi=1,m (xij - μj)2 for each detector• Set xij ⇒ (xij − μj) ∕ σj• X = [ x1 x2 ... xm ] → n × m matrix

Page 15: Using Principal Component Analysis to Remove Correlated Signal from Astronomical Images

Principal Component Analysis (PCA)

Step 1: Mean normalization (and feature scaling)

• Compute μj = (1∕m) Σi=1,m xij for each detector• Compute σ2

j = (1∕(m-1)) Σi=1,m (xij - μj)2 for each detector• Set xij ⇒ (xij − μj) ∕ σj• X = [ x1 x2 ... xm ] → n × m matrix

Page 16: Using Principal Component Analysis to Remove Correlated Signal from Astronomical Images

Principal Component Analysis (PCA)

Step 1: Mean normalization (and feature scaling)

• Compute μj = (1∕m) Σi=1,m xij for each detector• Compute σ2

j = (1∕(m-1)) Σi=1,m (xij - μj)2 for each detector• Set xij ⇒ (xij − μj) ∕ σj• X = [ x1 x2 ... xm ] → n × m matrix

*PCA can identify lower levelcorrelations among subsets of the detectors*

1mV

Page 17: Using Principal Component Analysis to Remove Correlated Signal from Astronomical Images

Principal Component Analysis (PCA)

Step 2: Calculate covariance matrix

• C = (1∕m) X XT (recall m = # time samples)• C → n × n symmetric matrix (recall n = 117 detectors)

Step 3: Eigen decomposition

• C = Q Λ Q-1 (*solve using SVD*)• Q = [ q1 q2 ... qn ] → n × n matrix containing eigenvectors qi

•Λ → n × n diagonal matrix containing eigenvalues λi = Λii• Principal components = uncorrelated variables

Page 18: Using Principal Component Analysis to Remove Correlated Signal from Astronomical Images

Principal Component Analysis (PCA)

Step 4: Choose number of components to remove

• Goal: choose fewest number of components (k) to REMOVE most of the observed variance in the data

• QR = [ qk+1 qk+2 ... qn ] → n × k matrix, k < n• Z = [ z1 z2 ... zm ] = QRT X → k x m matrix• To derive model of galaxy intensities on sky, use Z instead of X (but...)

Choosing k:

Variance after PCA (given k)Variance with average subtraction only

< 0.05

Page 19: Using Principal Component Analysis to Remove Correlated Signal from Astronomical Images

Principal Component Analysis (PCA)

Step 5: Reconstruct data without correlated signal

• Know RA/Dec for each detector: need to reconstruct approximation for data to make image

• XR = QR Z → n × m matrix with correlated signal removed!

1mV

Page 20: Using Principal Component Analysis to Remove Correlated Signal from Astronomical Images

Principal Component Analysis (PCA)

Step 5: Reconstruct data without correlated signal

• Know RA/Dec for each detector: need to reconstruct approximation for data to make image

• XR = QR Z → n × m matrix with correlated signal removed!

20μV

*Variance reduced by factor of 50*

Page 21: Using Principal Component Analysis to Remove Correlated Signal from Astronomical Images

Image of PKS J1127-1857Make the map:

• Use information on sky position for each detector at each time sample (RAi

j, Decij) and bin data onto image grid

• Set the intensity of each image pixel to the average of the xRij values

that fall into that bin• Smooth image by telescope point-spread response function

(Gaussian with FWHM=30’’)

PCA CleanedAverage Subtraction

• raw data = 30 MB• ttot = 4 min• 16640 samples/detector

Page 22: Using Principal Component Analysis to Remove Correlated Signal from Astronomical Images

An Extragalactic Survey at λ=1.1 mm

• Most galaxies are 100× fainter than PKS J1127-1857

• raw data ~ 25 GB• ttot ~ 80 hrs• ~ 2×107 samples/detector

• AzTEC/COSMOS survey• 0.7 deg2

• 500× area of HUDF• 160 hrs versus 11 days for HUDF• 130 mm-bright galaxies

Aretxaga et al. 2011

Page 23: Using Principal Component Analysis to Remove Correlated Signal from Astronomical Images

An Extragalactic Survey at λ=1.1 mm

• AzTEC/COSMOS survey• 0.7 deg2

• 500× area of HUDF• 160 hrs versus 270 hrs for HUDF• 130 mm-bright galaxies

Page 24: Using Principal Component Analysis to Remove Correlated Signal from Astronomical Images

An Extragalactic Survey at λ=1.1 mm

• AzTEC/COSMOS survey• 0.7 deg2

• 500× area of HUDF• 160 hrs versus 270 hrs for HUDF• 130 mm-bright galaxies

Page 25: Using Principal Component Analysis to Remove Correlated Signal from Astronomical Images

An Extragalactic Survey at λ=1.1 mm

• AzTEC/COSMOS survey• 0.7 deg2

• 500× area of HUDF• 160 hrs versus 270 hrs for HUDF• 130 mm-bright galaxies

Aretxaga et al. 2011

Capak et al. 2011

• AzTEC-3• Observed 1 Gyr after Big Bang• Starburst galaxy (SFR~1000 Msun/yr)