The Research on Algorithms of Estimating Photometric Redshifts Using SDSS Galaxy Data Wang Dan...

Post on 20-Jan-2016

218 views 0 download

Tags:

Transcript of The Research on Algorithms of Estimating Photometric Redshifts Using SDSS Galaxy Data Wang Dan...

The Research on Algorithms of Estimating Photometric Redshifts Using

SDSS Galaxy Data

Wang Dan

China-VO group

Chinese Virtual Observatory

11/29-12/03China-VO 2006, Guilin 2

Outline

• Background

• Various algorithms

• Comparison

• Summary

11/29-12/03China-VO 2006, Guilin 3

Background

• The redshift of a galaxy is measured spectroscopically • For those large and faint sets of galaxies, spectra of

galaxies are not quick and easy to obtain• Photometric redshift technique concentrates on

medium- or broadband color features• Photometric redshifts have been regarded as an

efficient and effective measure for studying the statistical properties of galaxies and their evolution.

11/29-12/03China-VO 2006, Guilin 4

Methods

• Template fitting approach

Real observation (CWW)

Population synthesis models (Bruzual & Charlot)

• Training set approach

Artificial Neural Networks (ANNs)

Support Vector Machines ( SVMs)

Multivariable Polynomial Regression (MPR)

Color-Magnitude-Redshift Relation (CMR)

Nonparametric Regression

11/29-12/03China-VO 2006, Guilin 5

Hyperz

where Fobs,i, Ftemp,i and σi are the observed and template fluxes and their uncertainty in filter i, respectively, and b is a normalization constant.

Do not reply on having any spectroscopic redshifts, need only a few templates.

, , ( )2 2

1

( ) [ ]filtersN

obs i temp i z

i i

F b Fx z

11/29-12/03China-VO 2006, Guilin 6

ANN topology:

Input Layer

Hidden Layer

Output Layer

ANNs

11/29-12/03China-VO 2006, Guilin 7

Principal:利用结构风险最小化的原理,即最小化预期风险的上限。通过最大化超平面与任意类训练样本的最小距离或最大化分类边界的距离,从而得到最优超平面。

SVMs

11/29-12/03China-VO 2006, Guilin 8

11/29-12/03China-VO 2006, Guilin 9

MPR

• Generate logical relationships between several independent variables and a dependent variable

• Training set containing the values of the independent and dependent variables

• MPR performs the regression and presents the result as a mathematical expression

• The more complete and representative training data we provide, the more accurate the estimate of redshifts will be

• Easy to communicate with astronomers.

11/29-12/03China-VO 2006, Guilin 10

11/29-12/03China-VO 2006, Guilin 11

CMR

• R-magnitude has been divided into 7 subsections• Build CMRI and CMRII for each sub-sample, CMRI is for

matrices of u- g- r, and CMRII is for matrices of g- r- i• CMRI and CMRII have been separated into 400 × 400

bins.• Compute the median redshift if the number of galaxies

exceeds 25• Achieve a color-redshift matrix, and compute the

redshifts from the matrices

11/29-12/03China-VO 2006, Guilin 12

11/29-12/03China-VO 2006, Guilin 13

Nonparametric Regression

• No (or very little) a priori knowledge • Selecting an appropriate bandwidth (smoothing

parameter) is a key part of nonparametric regression fitting

z

ni

ii 0

ni

i 0

c ck( )

hz(c,h)

c ck( )

h

2

2

( )

2ic c

hk e

Where c is training sample, ci is the test sample. h is the bandwidth.

11/29-12/03China-VO 2006, Guilin 14

Selection of the Bandwidth

11/29-12/03China-VO 2006, Guilin 15

Bandwidth versus redshift

11/29-12/03China-VO 2006, Guilin 16

Accuracies of Different Methods

• CWW 0.0666• Bruzual - Charlot 0.0552• ANNs 0.0229• SVMs 0.027• CMR 0.032• Nonparametric Regression 0.0236• MPR 0.0256

11/29-12/03China-VO 2006, Guilin 17

Summary

• Empirical photometric redshift estimators do rely on the existence of a sufficiently large and representative training set

• Difficulty in extrapolating to regions that are not well sampled by the training data.

• Well suited to problems that require the redshift distribution rather than accurate redshift of individual galaxy

11/29-12/03China-VO 2006, Guilin 18

Prospect

• With the large and deep sky survey projects carried out, more large and representive samples will be obtained.

• The development of new statistical analysis algorithms.

• Feature selection/extraction while data reprocessing• More ensemble algorithms (e.g. least-square SVMs).