Post on 20-Jan-2016
The Research on Algorithms of Estimating Photometric Redshifts Using
SDSS Galaxy Data
Wang Dan
China-VO group
Chinese Virtual Observatory
11/29-12/03China-VO 2006, Guilin 2
Outline
• Background
• Various algorithms
• Comparison
• Summary
11/29-12/03China-VO 2006, Guilin 3
Background
• The redshift of a galaxy is measured spectroscopically • For those large and faint sets of galaxies, spectra of
galaxies are not quick and easy to obtain• Photometric redshift technique concentrates on
medium- or broadband color features• Photometric redshifts have been regarded as an
efficient and effective measure for studying the statistical properties of galaxies and their evolution.
11/29-12/03China-VO 2006, Guilin 4
Methods
• Template fitting approach
Real observation (CWW)
Population synthesis models (Bruzual & Charlot)
• Training set approach
Artificial Neural Networks (ANNs)
Support Vector Machines ( SVMs)
Multivariable Polynomial Regression (MPR)
Color-Magnitude-Redshift Relation (CMR)
Nonparametric Regression
11/29-12/03China-VO 2006, Guilin 5
Hyperz
where Fobs,i, Ftemp,i and σi are the observed and template fluxes and their uncertainty in filter i, respectively, and b is a normalization constant.
Do not reply on having any spectroscopic redshifts, need only a few templates.
, , ( )2 2
1
( ) [ ]filtersN
obs i temp i z
i i
F b Fx z
11/29-12/03China-VO 2006, Guilin 6
ANN topology:
Input Layer
Hidden Layer
Output Layer
ANNs
11/29-12/03China-VO 2006, Guilin 7
Principal:利用结构风险最小化的原理,即最小化预期风险的上限。通过最大化超平面与任意类训练样本的最小距离或最大化分类边界的距离,从而得到最优超平面。
SVMs
11/29-12/03China-VO 2006, Guilin 8
11/29-12/03China-VO 2006, Guilin 9
MPR
• Generate logical relationships between several independent variables and a dependent variable
• Training set containing the values of the independent and dependent variables
• MPR performs the regression and presents the result as a mathematical expression
• The more complete and representative training data we provide, the more accurate the estimate of redshifts will be
• Easy to communicate with astronomers.
11/29-12/03China-VO 2006, Guilin 10
11/29-12/03China-VO 2006, Guilin 11
CMR
• R-magnitude has been divided into 7 subsections• Build CMRI and CMRII for each sub-sample, CMRI is for
matrices of u- g- r, and CMRII is for matrices of g- r- i• CMRI and CMRII have been separated into 400 × 400
bins.• Compute the median redshift if the number of galaxies
exceeds 25• Achieve a color-redshift matrix, and compute the
redshifts from the matrices
11/29-12/03China-VO 2006, Guilin 12
11/29-12/03China-VO 2006, Guilin 13
Nonparametric Regression
• No (or very little) a priori knowledge • Selecting an appropriate bandwidth (smoothing
parameter) is a key part of nonparametric regression fitting
z
ni
ii 0
ni
i 0
c ck( )
hz(c,h)
c ck( )
h
2
2
( )
2ic c
hk e
Where c is training sample, ci is the test sample. h is the bandwidth.
11/29-12/03China-VO 2006, Guilin 14
Selection of the Bandwidth
11/29-12/03China-VO 2006, Guilin 15
Bandwidth versus redshift
11/29-12/03China-VO 2006, Guilin 16
Accuracies of Different Methods
• CWW 0.0666• Bruzual - Charlot 0.0552• ANNs 0.0229• SVMs 0.027• CMR 0.032• Nonparametric Regression 0.0236• MPR 0.0256
11/29-12/03China-VO 2006, Guilin 17
Summary
• Empirical photometric redshift estimators do rely on the existence of a sufficiently large and representative training set
• Difficulty in extrapolating to regions that are not well sampled by the training data.
• Well suited to problems that require the redshift distribution rather than accurate redshift of individual galaxy
11/29-12/03China-VO 2006, Guilin 18
Prospect
• With the large and deep sky survey projects carried out, more large and representive samples will be obtained.
• The development of new statistical analysis algorithms.
• Feature selection/extraction while data reprocessing• More ensemble algorithms (e.g. least-square SVMs).