The SDSS, The Dark Energy Survey and Large Scale Sky Surveys Data Intensive Experimental Science...

65
The SDSS, The Dark Energy Survey and Large Scale Sky Surveys Data Intensive Experimental Science James Annis Fermilab

Transcript of The SDSS, The Dark Energy Survey and Large Scale Sky Surveys Data Intensive Experimental Science...

Page 1: The SDSS, The Dark Energy Survey and Large Scale Sky Surveys Data Intensive Experimental Science James Annis Fermilab.

The SDSS, The Dark Energy Survey and Large Scale Sky Surveys

Data Intensive Experimental Science

James Annis

Fermilab

Page 2: The SDSS, The Dark Energy Survey and Large Scale Sky Surveys Data Intensive Experimental Science James Annis Fermilab.

Dec 3, 2004 2James Annis - Fermilab

Collaborators: in all cases, many folks

Steve Kent, Tim Mckay, Risa Weschler, Erin Sheldon, Gus Evrard, Gabriele Garzoglio, Huan Lin, Alex Szalay, Maria Nieto-Sebastian…

Mike Wilde, Jens Vockler, Yong Zhao, Mike Mulligan, Ian Foster, Vijay Sekhri, Neha Sharma…

Brenna Flaugher, John Peoples, William Wester, Huan Lin…

Page 3: The SDSS, The Dark Energy Survey and Large Scale Sky Surveys Data Intensive Experimental Science James Annis Fermilab.

Dec 3, 2004 3James Annis - Fermilab

I: Large scale sky surveys, current and planned.

Page 4: The SDSS, The Dark Energy Survey and Large Scale Sky Surveys Data Intensive Experimental Science James Annis Fermilab.

Dec 3, 2004 4James Annis - Fermilab

The Scales of the Night Sky

There is only one sky and it is 41,000 sq-degrees

100 Terabytes/sky/color At 0.1”/pixel and 2 bytes/pixel

~ 9 billion galaxies in the observable universe

z < 3 and L > 0.1 L*

Note the scales here: Hundreds of thousands of

images/files Tens of Terapixels Billion object databases

2MASS Star Map

2MASS Galaxy Map

Page 5: The SDSS, The Dark Energy Survey and Large Scale Sky Surveys Data Intensive Experimental Science James Annis Fermilab.

Dec 3, 2004 5James Annis - Fermilab

CURRENT SURVEYS:The 2 Micron All Sky Survey- 2MASS

Collaboration UMASS and NASA-IPAC

Goal Construct modern IR imaging sky map Classic astronomy driven

Instrumentation 2 existing 1.3m telescopes, N+S 2 new 0.2 Mega-pixel IR cameras Entirely new science analysis code Professional data factory

Status: 4 years, completed 2001 Data

41,000 sq-degrees, fsky = 100% Imaging

2” pixels, 3 infrared colors, 6 exposures Final image 1” pixels 0.5 TB/color

Catalogs Point source catalog: 471 million objects Extended source catalog: 1.6 million objects

Commentary: Imaging surveys take large scale production efforts

Page 6: The SDSS, The Dark Energy Survey and Large Scale Sky Surveys Data Intensive Experimental Science James Annis Fermilab.

Dec 3, 2004 6James Annis - Fermilab

CURRENT SURVEYS:The 2dF Galaxy Redshift Survey

Collaboration: AAT, Australian and UK Universities

Goal: Measure the galaxy power spectra using a

large redshift survey Experiment driven

Instrumentation Existing 3.9m telescope New 400 fiber robot spectrograph

Status: completed 2002 Data

250,000 redshifts Small data volume

Galaxy redshift mapCommentary: Very successful at delivering targeted science on time

Page 7: The SDSS, The Dark Energy Survey and Large Scale Sky Surveys Data Intensive Experimental Science James Annis Fermilab.

Dec 3, 2004 7James Annis - Fermilab

CURRENT SURVEYS:The Sloan Digital Sky Survey- SDSS

Collaboration Fermilab, USNO, JPG, Princeton, Chicago,

JHU, Washington, MPIA, Los Alamos, NMSU, …

Goal Construct optical CCD imaging sky map Obtain 1 million spectra Survey driven

Instrumentation New 2.5m telescope New 120 Mega-pixel camera New 600 fiber, twin dual camera

spectrographs Entirely new science analysis code Professional data factory

Status: completion 2005, extended to 2009?

Page 8: The SDSS, The Dark Energy Survey and Large Scale Sky Surveys Data Intensive Experimental Science James Annis Fermilab.

Dec 3, 2004 8James Annis - Fermilab

CURRENT SURVEYS:SDSS Data Description and Footprint

Data 7,000 sq-degrees, fsky = 17% In Galactic caps

Avoids high stellar density Avoids high extinction by dust

Imaging (in blue) 0.4” pixels, 5 optical colors 2% photometry 20 repeat scans of 250 sq-degrees

along fall equator

Spectra (in red) 4096 pixels, 1A resolution 390nm to 900 nm spectrophotometric

Commentary: Focus on survey produced extremely high quality data, but delayed science output

Page 9: The SDSS, The Dark Energy Survey and Large Scale Sky Surveys Data Intensive Experimental Science James Annis Fermilab.

Dec 3, 2004 9James Annis - Fermilab

CURRENT SURVEYS:SDSS Data Releases

SDSS releases data in stages, roughly a year after the spectra are taken

Early Data Release Data Release 1, 2 Current data release is DR3

5282 sq-degrees 6 TB of images 141 million objects 530,00 spectra

374,000 galaxies 50,000 quasars 50,000 stars

1.2 TB of catalogs (FITS) 2.3 TB of database (SQL) http://www.sdss.org/dr3

SDSS DR1 Galaxy Map

SDSS DR1 Galaxy Redshift Map

Page 10: The SDSS, The Dark Energy Survey and Large Scale Sky Surveys Data Intensive Experimental Science James Annis Fermilab.

Dec 3, 2004 10James Annis - Fermilab

NEXT GENERATION SURVEYS:The Dark Energy Survey

Collaboration Fermilab, UIUC, Chicago, LBNL, NOAO,

NCSA Goal

Constrain dark energy parameter w to 5% with 4 independent techniques

Experiment driven Instrumentation

Existing 4m telescope New 500 Mega-pixel optical camera Professional data factory

Status: 5 years, starting in 2009 Data

5,000 sq-degrees, fsky = 12% Imaging

0.25” pixels, 4 optical colors, 5-10 exposures Final image 0.125” pixels ~100 TB disk, 10 PB mass storage

Catalogs ~500 million objects

Page 11: The SDSS, The Dark Energy Survey and Large Scale Sky Surveys Data Intensive Experimental Science James Annis Fermilab.

Dec 3, 2004 11James Annis - Fermilab

NEXT GENERATION SURVEYS:The Large Survey Telescope

Two candidates for the LST Pan-STARRS

2006- one 1.5m telescope, a 2 Giga-pixel camera/telescope 4 telescope Pan-STARRS-II in 2008? Private observatory model:

Air Force funds University of Hawaii LSST

2012- 6.5m telescope, 2 Giga-pixel camera Collaboration model:

Universities, NSF, DOE, private donors LSST Corp, NOAO, SLAC, Brookhaven, L. Livermore, …

Goal Survey 15,000 sq-degrees every 2-3 nights for 10 years Explore time domain for killer asteroids, kuiper belt objects,

supernova, gamma ray bursts, “things that go bump in the night”. Deep imaging for weak lensing, cluster finding, … Survey driven

Status: Operational in 2012, if technology limited Data

Huge data rate: 4 PB/year of images, 4 TB/year catalogs Driven by time domain science

PAN-Stars

LSST

Page 12: The SDSS, The Dark Energy Survey and Large Scale Sky Surveys Data Intensive Experimental Science James Annis Fermilab.

Dec 3, 2004 12James Annis - Fermilab

The Structure of Imaging Surveys

Surveys have the scientific reach to answer big problemsExperiment driven produces more focused science

As a side effect, surveys provideThe most high quality data

To the most scientists

For the lowest cost

The best data come from taking the survey data seriously The next few slides describe survey data processing

Page 13: The SDSS, The Dark Energy Survey and Large Scale Sky Surveys Data Intensive Experimental Science James Annis Fermilab.

Dec 3, 2004 13James Annis - Fermilab

Imaging Survey Data Reduction

Images are the raw data Image Processing

Corrected images are the base data Remove CCD effects like QE variations,

bias, overscan Locate and measure point spread

functions

Locate and measure objects Astrophysical Catalog Construction

Photometric calibrations are applied Colors and shape information heavily

used to distinguish between classes. PSF-like

Star Quasar

Galaxy-like Measure photometric redshift Measure intrinsic spectral type

In the SDSS, we do not make access to raw images easy.

The corrected images are what the SDSS serves.

The atmosphere contributes to the PSF as well as the instrument. In the SDSS we measure it spatially and temporally, and represent it using “eigen-psfs”.

Optimal measurement of galaxies and stars is a rich area of algorithm development

Many additional columns can be added to object catalogs. The SDSS produces catalogs of stars, quasars, high-z quasars. galaxies, main-survey galaxies, LRGs; most of these have one or more photometric redshift estimates.

So here too, the raw catalogs are available, but not so easily. Calibrated, value added catalogs are served.

Page 14: The SDSS, The Dark Energy Survey and Large Scale Sky Surveys Data Intensive Experimental Science James Annis Fermilab.

Dec 3, 2004 14James Annis - Fermilab

The SDSS Production System

Page 15: The SDSS, The Dark Energy Survey and Large Scale Sky Surveys Data Intensive Experimental Science James Annis Fermilab.

Dec 3, 2004 15James Annis - Fermilab

Sky Survey Data Production

Factorize pipelines Pre-jobs: data staging Jobs: main reduction step Post-jobs: quality control Chain these factors into logically higher

units Bookkeeping on pipelines:

Versions of pipelines Parameter files Long lived input data Older versions of data concurrent with

new

Fermilab-SDSS experience “In watching the Fermilab SDSS

pipeline (that Steve Kent originally built) I am struck by how much they need a work flow system just to track the steps and to make sure that the quality steps are enforced. Right now much of that is `human process’.”

Data derivation, data lineage, and work flow tracking are of great interest in the data factories of large scale sky surveys.

Page 16: The SDSS, The Dark Energy Survey and Large Scale Sky Surveys Data Intensive Experimental Science James Annis Fermilab.

Dec 3, 2004 16James Annis - Fermilab

Chimera Virtual Data System

We note with interest that the same information needed for “Virtual Data” (transparency with respect to materialization) is the same that is needed for good bookkeeping.

One should be able to reproduce a given data set at any time. A “make” for data.

This information provides data tracking and result auditability.

Chimera/VDS provides workflow management that provides:

A new, structured paradigm for organizing, locating, specifying, and requesting data products

The possibility for performance optimizations, for example recreating derived data using data from a nearby archive rather than transferring it.

Places programs on same footing as data Programs are significant community

resources

VDL Interpreter(manipulate derivationsand transformations)

Virtual Data Catalog(implements ChimeraVirtual Data Schema)

Virtual Data Language

XML

Chimera

Chimera consists of Virtual data catalog

Transformations, derivations, data Virtual data language

Catalog Definitions Query Tool

Page 17: The SDSS, The Dark Energy Survey and Large Scale Sky Surveys Data Intensive Experimental Science James Annis Fermilab.

Dec 3, 2004 17James Annis - Fermilab

II: The example of the Dark Energy Survey

Page 18: The SDSS, The Dark Energy Survey and Large Scale Sky Surveys Data Intensive Experimental Science James Annis Fermilab.

Dec 3, 2004 18James Annis - Fermilab

THE DARK ENERGY SURVEY

Science Goals: Perform a 5000 sq. deg. survey of the southern

galactic cap Map the cosmological density field to z=1 Constrain the Dark Energy parameter w to ~5% with

4 complementary techniques begin to constrain dw/dz

New Equipment: Replace the PF cage on the CTIO Blanco 4m

telescope with a new 2.2 deg. FOV optical CCD camera

Time scale: Instrument Construction 2005-2009

Survey: 30% of the telescope time from 2009-2013

Page 19: The SDSS, The Dark Energy Survey and Large Scale Sky Surveys Data Intensive Experimental Science James Annis Fermilab.

Dec 3, 2004 19James Annis - Fermilab

DARK ENERGY OBSERVATIONS:Density Fluctuation Power Spectra

POWER SPECTRUM70% dark energy25% dark matter5% baryons

The energy density of the dark energy is currently best measured using mass density fluctuations, through a spatial power spectrum analysis that combines WMAP observations of the CMB with SDSS observations of the galaxy distribution.

Page 20: The SDSS, The Dark Energy Survey and Large Scale Sky Surveys Data Intensive Experimental Science James Annis Fermilab.

Dec 3, 2004 20James Annis - Fermilab

DARK ENERGY OBSERVATIONS:Acceleration Discovered Using SN

Two groups discovered, independently, acceleration

Supernova Cosmology Project High-Z Supernova Project

Acceleration is measured, whereas in power spectra techniques it is the dark energy density that is measured.

These techniques rely on type Ia supernova (white dwarf explosions) having uniform luminosities, or at least correctable to uniform. Acceleration is then the observation that distant supernovae are too dim for their redshift.

Page 21: The SDSS, The Dark Energy Survey and Large Scale Sky Surveys Data Intensive Experimental Science James Annis Fermilab.

Dec 3, 2004 21James Annis - Fermilab

DARK ENERGY OBSERVATIONS:SN Can Probe Past DE Dominated Regime

Hubble Space Telescope observations can reach z > 1.0

Dark energy becomes dominant constituent of the universe at z~0.75.

HST observations see turnover in the effect of DE on apparent brightness of supernovae.

redshift

Δ Mag

Page 22: The SDSS, The Dark Energy Survey and Large Scale Sky Surveys Data Intensive Experimental Science James Annis Fermilab.

Dec 3, 2004 22James Annis - Fermilab

DARK ENERGY

95% of the Universe is in Dark Energy and Dark matter for which we have no understanding.

The confirmation of Dark Energy points to a major hole in our understanding of fundamental physics

1998 and 2003 Science breakthroughs of the year

Page 23: The SDSS, The Dark Energy Survey and Large Scale Sky Surveys Data Intensive Experimental Science James Annis Fermilab.

Dec 3, 2004 23James Annis - Fermilab

One measures dark energy through how it affects the universe expansion rate, H(z):

H2(z) = H20 [ M (1+z) 3 + R (1+z) 4 + DE (1+z) 3 (1+w) ]

matter radiation dark energy Note w, the parameter which describes the evolution of the density of dark energy

with redshift. A cosmological constant has w = 1.w is currently constrained to ~20% by WMAP, SDSS, and supernovae

Measurements are usually integrals over H(z) r(z) = dz/H(z) Standard Candles (e.g., supernova) measure dL(z) = (1+z) r(z)

Standard Rulers measure da(z) = (1+z)1 r(z) Volume Markers measure dV/dzd = r2(z)/H(z) The rate of growth of structure is a more complicated function of H(z)

Measuring Dark Energy

Page 24: The SDSS, The Dark Energy Survey and Large Scale Sky Surveys Data Intensive Experimental Science James Annis Fermilab.

Dec 3, 2004 24James Annis - Fermilab

The Dark Energy Survey 4 Key Projects

Dark Energy using new probes

Galaxy Cluster counting 20,000 clusters to z=1 with

M > 2x1014 M

Cluster angular power spectrum

Weak lensing 300 million galaxies Photo-z accuracy of z < 0.1

to z = 1 10-20 galaxies/sq-arcminute

The angular power spectra is used as a standard ruler

300 million galaxies Can be broken up into bins

of photometric redshift Peak and baryon oscillation

features provide ruler

Dark Energy using the gold standard probe

Type 1a Supernovae distances

2000 supernovae 40 sq-degrees Revisit at 3 night intervals Photo-z for all host galaxies Spectroscopic-z for ~1/4 of

all host galaxies.

Page 25: The SDSS, The Dark Energy Survey and Large Scale Sky Surveys Data Intensive Experimental Science James Annis Fermilab.

Dec 3, 2004 25James Annis - Fermilab

SURVEY DESIGN:Critical Cluster Complementarity

Combination of SPT mass measurements and DES redshifts place joint constraints on w and m :

Fiducial cosmology parameters from WMAP: 8=0.84, m=0.27, w = -1

29000 clusters in the 4000 deg2 DES+SPT survey area

Curvature free to vary (dashed); one sigma uncertainty on w is 0.071

Curvature fixed @ 0 (solid); one sigma uncertainty on w is 0.04

Parameter degeneracies from different techniques are

complementary

DES + SPT: Majumdar & Mohr 2003SNAP: Perlmutter & Schmidt 2003WMAP: Spergel et al 2003

Page 26: The SDSS, The Dark Energy Survey and Large Scale Sky Surveys Data Intensive Experimental Science James Annis Fermilab.

Dec 3, 2004 26James Annis - Fermilab

Lensing Cosmography

The physics of weak lensing is that the intervening mass distorts the shape of background galaxies; the distortion can be described by a distortion matrix and can be measured.

The strength of weak lensing by the same foreground galaxies varies with the distance to the background galaxies.

Measure amplitude of shear vs. z shear-galaxy correlations shear-shear correlations

DES will Image 5000 sq-degrees Photo-z accuracy of z < 0.1 to z = 1 10-20 galaxies/sq-arcminute

Shear map(z)

Galaxy map

z = 1/4z = 1/2

z = 3/4

Page 27: The SDSS, The Dark Energy Survey and Large Scale Sky Surveys Data Intensive Experimental Science James Annis Fermilab.

Dec 3, 2004 27James Annis - Fermilab

Photometric Redshifts

g r i

Photometric redshifts are the key technology of the Dark Energy Survey. With them, we can pursue 4 key projects of extraordinary power.

E galaxy spectra Redshift

Page 28: The SDSS, The Dark Energy Survey and Large Scale Sky Surveys Data Intensive Experimental Science James Annis Fermilab.

Dec 3, 2004 28James Annis - Fermilab

Photo-z’s in the DES

Photo-z of red galaxies at 0 < z < 1

left, 0.5 L* galaxies

right, 2 L* galaxies

Photo-z of all galaxies, red or blue, in the last magnitude shell of the DES, 23 < i < 24

blue is 1 sigma, red mean

Page 29: The SDSS, The Dark Energy Survey and Large Scale Sky Surveys Data Intensive Experimental Science James Annis Fermilab.

Dec 3, 2004 29James Annis - Fermilab

INTERMEDIATE TIME SCALE South Pole Telescope SZ Survey

10m submillimeter telescope At the South Pole 1000 element bolometer array 1.25 arcminute resolution

Collaboration John Carlstrom (Chicago) PI Chicago, CWRU, Berkeley, Illinois and

Harvard-Smithsonian CfA Science Goals

4000 sq-degree Sunyaev-Zeldovich effect survey

Cluster abundances and spatial power spectra CMB polarization

NSF funded, Survey slated for 2007

But, No redshifts!

SZ observations of clusters

SPT site

Page 30: The SDSS, The Dark Energy Survey and Large Scale Sky Surveys Data Intensive Experimental Science James Annis Fermilab.

Dec 3, 2004 30James Annis - Fermilab

III: Dark Energy Survey Survey Strategy and the SDSS Coadd

Page 31: The SDSS, The Dark Energy Survey and Large Scale Sky Surveys Data Intensive Experimental Science James Annis Fermilab.

Dec 3, 2004 31James Annis - Fermilab

SURVEY DESIGN

Primary Survey: SDSS g,r,i,z :10 Limiting mag: 24.6, 24.1, 24.3, 23.9 Survey Area 5000 sq. deg. in Southern Galactic Cap Connection to SDSS stripe 82 for photo-z calibration Multiple tilings (5+) in nominally 100sec units

Secondary Survey: 40 deg2 synoptic 3 night revist scale over 4 months of survey

Science Program Survey Description

Page 32: The SDSS, The Dark Energy Survey and Large Scale Sky Surveys Data Intensive Experimental Science James Annis Fermilab.

Dec 3, 2004 32James Annis - Fermilab

Define a hex grid on the sky

NVO Hyperatlas standard ISEA-6-TAN-22

0.25” pixels

This uses hexagons as the binning/coverage map

There are other techniques, such as Healpix, which is a nested triangle approach widely used in CMB data sets and very good for power spectra calculations.

The catalogs can be arranged this way. So can the images, which will be resampled for the DES.

Page 33: The SDSS, The Dark Energy Survey and Large Scale Sky Surveys Data Intensive Experimental Science James Annis Fermilab.

Dec 3, 2004 33James Annis - Fermilab

Covering the DES Area

SPT Area

Tie region

SDSS Stripe 82

Albers equal area projection

planetary projection

Page 34: The SDSS, The Dark Energy Survey and Large Scale Sky Surveys Data Intensive Experimental Science James Annis Fermilab.

Dec 3, 2004 34James Annis - Fermilab

The Tiling of the Sky is Driven by Photometry

Relative photometry Use overlapping images of stars to place all images on same relative system.

1000’s of stars per overlap Precision very high, limited by systematics

Overlapping tilings Allow reduction of systematics

Recipe: Tile the plane Then, tile the plane with

hex offset half hex over and up

This gives 30% overlap with three hexagons

Repeat, with different offsets

1 tiling 2 tilings 3 tilings

Page 35: The SDSS, The Dark Energy Survey and Large Scale Sky Surveys Data Intensive Experimental Science James Annis Fermilab.

Dec 3, 2004 35James Annis - Fermilab

How Many Tilings of the Sky

1 tiling

3 tilings

2 tilings

DECam is 10% sparse

10% of each tiling is uncovered

>= 4 tilings required for every point to have 2 or more images

Page 36: The SDSS, The Dark Energy Survey and Large Scale Sky Surveys Data Intensive Experimental Science James Annis Fermilab.

Dec 3, 2004 36James Annis - Fermilab

How Many Tilings II

Multiple tilings allow CMB techniques to average down photometric errors.

4 tilings takes us to 0.012% relative calibration.

Choose 4 tilings as minimum

–0.20 mags to +0.20 mags

Relative Photometry

–0.20 mags to +0.20 mags

-0.2

Absolute Photometry

Relative Calibration

Tiling

1 0.035

2 0.018

5 0.010

Absolute Calibration

Tiling

N /Sqrt(N)

Page 37: The SDSS, The Dark Energy Survey and Large Scale Sky Surveys Data Intensive Experimental Science James Annis Fermilab.

Dec 3, 2004 37James Annis - Fermilab

Global Relative Photometry

Solutions x = W y Simple average coadd

Wcoadd = [AtA] -1At

Weighted averaging W = [At N -1 A] -1AtN -1 N is the noise covariance matrix Minimum variance for Gaussian noise

Provides least squares flux scalings That is, the flat map

Inverting large matricies Year 1: 4 matrices of 6000x4000 Year 2: 4 matrices of 30,000x8000

CMB style mapping strategy y = A x + N y = observations

Ratios of instrumental star fluxes between pairs of hexes (62 ccds = 1 hex)

Includes effects of uncorrected flat field problems and scattered light problems

x = scale factor map Scale factor for a given hex image

N = noise A = survey mapping

0 if no overlap 1/3 if 2nd, 3rd, tiling overlap ½ if 4th, and higher tile overlaps

Page 38: The SDSS, The Dark Energy Survey and Large Scale Sky Surveys Data Intensive Experimental Science James Annis Fermilab.

Dec 3, 2004 38James Annis - Fermilab

The SDSS and DES Coadd

Image Processing Images suffer from distortion and need

to be rectified before they can be averaged.

This is “image morphing”, with some care:

Map at ½ pixel scale flux conservation (missing pixels) Increased resolution (drizzling)

Interpolate mapped pixels Use windowed sinc function Lanczos: sinc(x pi)*sinc(x pi/2) Flux conservation: (kernel sums to one)

Correct for change of pixel size Geometrical correction Multiply by Jacobean Flux conservation (change of pixel area)

Data intensive SDSS coadd is ~15 observations of 250 sq-

degrees and is a 3 TB problem (8 TB if intermediate files are saved).

For DES, each tiling is 2 TB 2 tilings/year/color Year 1: 4 coadds of 4 TB each Year 2: 4 coadds of 8 TB each Input data, that is. Intermediate files will be

4x larger. Will coadd 1 years worth of DES data

simulations in 2007.

For the SDSS, we add ~0.5 TB per year Campaigns in spring

Spring 05 Spring 06 Spring 06

Page 39: The SDSS, The Dark Energy Survey and Large Scale Sky Surveys Data Intensive Experimental Science James Annis Fermilab.

Dec 3, 2004 39James Annis - Fermilab

Coaddition

One can average together images taken of the same piece of sky to achieve better signal to noise, just as if one had exposed longer or went to a bigger telescope.

Average Or sigma clip Or median Or variance weight Or subspace filter Or … There are many varieties of averaging.

Before

After

Page 40: The SDSS, The Dark Energy Survey and Large Scale Sky Surveys Data Intensive Experimental Science James Annis Fermilab.

Dec 3, 2004 40James Annis - Fermilab

Data Intensive Science:Thunder Runs

Summer 04: 40,000 jobs 3 TB total input

8 TB during processing 1 TB output/coadd Depth Optimized Coadd

Grid3 Custom management code Tarballs from tam01 Remote code deployment Vox/Voms VO authorization

WL Optimized Coadd Qcdhome (a local cluster) 75 node, dual cpu cluster Inside Fermilab, NFS mounted data Half the input data volume

March 05: Open Science Grid 40,000 jobs, 3.5 TB

Virtual Data System Provenance tracking: Chimera Computation planning: Pegasus Data tracking: RLS

SRM For storage allocation and transfer on

the grid Built on reliable file transfer RFT, built

on gsiftp Direct transfer from Grid to Fermilab

mass storage Enstore: first class grid entity

Collaboration: Huan Lin, Hubert Lampetil, Vijay Sekhri

Collaboration: Huan Lin, Neha Sharma, Mike Wilde, Jens Voekler, Ian Foster, Ruth Pordes

Page 41: The SDSS, The Dark Energy Survey and Large Scale Sky Surveys Data Intensive Experimental Science James Annis Fermilab.

Dec 3, 2004 41James Annis - Fermilab

IV:SDSS Cluster Finding

Page 42: The SDSS, The Dark Energy Survey and Large Scale Sky Surveys Data Intensive Experimental Science James Annis Fermilab.

Dec 3, 2004 42James Annis - Fermilab

The MaxBcg AlgorithmCouched in Statistical Terms

Catalog processing

maxBcg = “maximum liklihood brightest cluster galaxy” In a 5 space defined by:

RA, Dec spatial i brightness g-r color r-i color

Perform an adaptive grid/astrophysical trajectory computation of likelihood

Adaptive grid: ra,dec locations at galaxy positions Astrophysical trajectory: i,g-r,r-i locations along expected apparent magnitudes

and colors of a brightest cluster galaxy at all redshifts

Page 43: The SDSS, The Dark Energy Survey and Large Scale Sky Surveys Data Intensive Experimental Science James Annis Fermilab.

Dec 3, 2004 43James Annis - Fermilab

SDSS Clusters

Z=0.041

Z=0.138

Z=0.277

Z=0.377

Page 44: The SDSS, The Dark Energy Survey and Large Scale Sky Surveys Data Intensive Experimental Science James Annis Fermilab.

Dec 3, 2004 44James Annis - Fermilab

The MaxBcg AlgorithmGalaxy Cluster Finding

N=0 N=19 N=0

z = 0.06 z = 0.13 z = 0.20

Likelihood= -7.8 Likelihood= 1.9 Likelihood= -8.4

Page 45: The SDSS, The Dark Energy Survey and Large Scale Sky Surveys Data Intensive Experimental Science James Annis Fermilab.

Dec 3, 2004 45James Annis - Fermilab

The Galaxy Number Function

The output is a number function, which is compared to theoretical number functions…

Page 46: The SDSS, The Dark Energy Survey and Large Scale Sky Surveys Data Intensive Experimental Science James Annis Fermilab.

Dec 3, 2004 46James Annis - Fermilab

DES Cluster Photo-z

DES data will enable cluster photometric redshifts with dz~0.02 for clusters out to z~1.3, for all masses relevant to the SPT Survey.

Approximate mass limit of SPT SZ survey

Optical catalogs will be complete to half this mass to z=1

2.5 x 1014 solar mass clusters

1.0 x 1014 solar mass clusters

Page 47: The SDSS, The Dark Energy Survey and Large Scale Sky Surveys Data Intensive Experimental Science James Annis Fermilab.

Dec 3, 2004 47James Annis - Fermilab

V:SDSS Cluster Finding as a Test Case

Page 48: The SDSS, The Dark Energy Survey and Large Scale Sky Surveys Data Intensive Experimental Science James Annis Fermilab.

Dec 3, 2004 48James Annis - Fermilab

maxBCG Cluster Finding as a Test Case

We have used maxBCG over the last several years to evaluate several approaches to performing large scale, data intensive astronomy jobs.

Base rule: Cheating is using Fermilab’s large compute farms.

Approaches Specialized machines

Take the computers to the data Good if you can afford it

Virtual Data on the Grid Send the data to the computers The Grid is currently a batch system

Running inside the SDSS database Send the code to the data SQL relieves one of data transport coding Implies having large database farms at data archive centers

Page 49: The SDSS, The Dark Energy Survey and Large Scale Sky Surveys Data Intensive Experimental Science James Annis Fermilab.

Dec 3, 2004 50James Annis - Fermilab

Cluster Finding on the Grid

Used cluster finding as a Griphyn/iVDGL challenge problem

Things to be solved VOX/VOMS/SAZ

Auto-generate gridmap file for virtual organizations

Remote code deploy Science code is ever changing Code deployment is just another grid

job Use VDT, the Virtual Data Toolkit

Ease of installation! Chimera, to track derivations Pegasus, to plan computations RLS, to track copies of the data

Lessons The Grid does not like things on the scale of

100,000 Directories fill Simple databases don’t scale

XML databases, for example Transferring lots of little files –vastly-

inefficient, yet SDSS data is –all- little files This is a problem for mass storage as well.

The Grid works the best if one treats it like a simple batch system, and don’t let the tools do to much for you.

Virtual Data is a concept more suited to captive clusters and large archives then small users

That was the state in late 2003-early 2004. Progress is being made quickly; it is research and development after all.

Page 50: The SDSS, The Dark Energy Survey and Large Scale Sky Surveys Data Intensive Experimental Science James Annis Fermilab.

Dec 3, 2004 51James Annis - Fermilab

Cluster Finding DAG

This is the dag for 1 place on the sky.

Complicated: later stages depend on the intermediate results of nearby dags

A real analysis, not constructed to be bad.

Page 51: The SDSS, The Dark Energy Survey and Large Scale Sky Surveys Data Intensive Experimental Science James Annis Fermilab.

Dec 3, 2004 52James Annis - Fermilab

Cluster Finding DAG: Atomic DAG

The “atomic DAG”. Nonsense scientifically, as cluster finding is non-local

But cluster finding on the natural scale of the problem leads to BIG dags

Page 52: The SDSS, The Dark Energy Survey and Large Scale Sky Surveys Data Intensive Experimental Science James Annis Fermilab.

Dec 3, 2004 53James Annis - Fermilab

Cluster Finding DAG: Miniscule DAG

The “miniscule DAG”. Not too bad scientifically, but sky coverage is trival.

Cluster finding on the natural scale of the problem leads to BIG dags

Page 53: The SDSS, The Dark Energy Survey and Large Scale Sky Surveys Data Intensive Experimental Science James Annis Fermilab.

Dec 3, 2004 54James Annis - Fermilab

Working in the database

Skyserver is the current SDSS catalog database

SQL is a natural language for astronomers, in the sense that it is easy for them to learn, even for simple quick tasks.

Page 54: The SDSS, The Dark Energy Survey and Large Scale Sky Surveys Data Intensive Experimental Science James Annis Fermilab.

Dec 3, 2004 55James Annis - Fermilab

Working in the database II

There are two new features in skyserver that are fascinating.

CasJobs is a username/password access batch system that allows one to run long queries

MyDB is a username/password access database for individuals.

These are more than the equivalent of a login and disk space allocation;

the queries and the results are stored and may be run again when the data size has increased

Errors are kept

Page 55: The SDSS, The Dark Energy Survey and Large Scale Sky Surveys Data Intensive Experimental Science James Annis Fermilab.

Dec 3, 2004 56James Annis - Fermilab

Working in the database III

Bring the code to the data

We converted the maxBCG algorithm to SQL

My revelation was that the 1000’s of lines of the code went down to < 500 lines of SQL

=> much of my code had to do with data management, not algorithm.

I knew that; the algorithm is –very- simple. But the sheer shortness of the SQL was a revelation.

Clarity is better. So is speed; the algorithm runs about

10x faster in the SQL implentation

App lyin g a z on e s tra te gy , P ge ts pa rtit ion ed h om og e no us ly

am o ng 3 se rv ers .

- S1 pr ovid es 1 de g bu ff er on to p- S2 pr ovid es 1 de g bu ff er on to p an d bo tto m

- S3 pr ovid es 1 de g bu ff er on b ott om

PP3

N ativ e to Se rv er 2

P2

P1

Na tive to Se rve r 1

N ativ e t o S er ver 3

App lyin g a z on e s tra te gy , P ge ts pa rtit ion ed h om og e no us ly

am o ng 3 se rv ers .

- S1 pr ovid es 1 de g bu ff er on to p- S2 pr ovid es 1 de g bu ff er on to p an d bo tto m

- S3 pr ovid es 1 de g bu ff er on b ott om

PP3

N ativ e to Se rv er 2

P2

P1

Na tive to Se rve r 1

N ativ e t o S er ver 3

Data distribution among 3 SQL Servers. Total duplicated data = 4 x 13 deg2. Total duplicated work (objects processed more than once) = 2 x 11 deg2

Maria Nieto-Santisban

Page 56: The SDSS, The Dark Energy Survey and Large Scale Sky Surveys Data Intensive Experimental Science James Annis Fermilab.

Dec 3, 2004 57James Annis - Fermilab

The DES Future

DES has Fermilab Stage I approval

DES Stage II approval in 2006

The survey starts in October 2009

DES incorporate a robust simulation effort, applying instrumental effects to simulated catalogs and simulated images.

1 year of DES data to be processed in 2007

Production of simulations in 2006

The computation of the DES Draws from the heritage of the SDSS

The computation of the DES Currently two main computation centers

NCSA and Fermilab Grid replication/archive replication

Processing done at NCSA Impulse re-processing can be done on

the grid Analysis clusters for some science,

especially image science Database clusters for SQL-based

analyses.

Page 57: The SDSS, The Dark Energy Survey and Large Scale Sky Surveys Data Intensive Experimental Science James Annis Fermilab.

Dec 3, 2004 58James Annis - Fermilab

Summary

Large sky surveys are powerful tools for answering important questions in fundamental physics

As a side effect, widely useful legacy data

These sky surveys need formal software tools to handle the data flow

The DES aims to constrain dark energy parameter w to 5% with 4 independent techniques

Many of the most interesting aspects of the production and analysis of the survey data are data intensive (more so than computational) and motivate interesting information technology research.

Page 58: The SDSS, The Dark Energy Survey and Large Scale Sky Surveys Data Intensive Experimental Science James Annis Fermilab.

Dec 3, 2004 59James Annis - Fermilab

Backup slides

Page 59: The SDSS, The Dark Energy Survey and Large Scale Sky Surveys Data Intensive Experimental Science James Annis Fermilab.

Dec 3, 2004 60James Annis - Fermilab

THE DARK ENERGY SURVEY 500 Megapixel Camera

0

10

20

30

40

50

60

70

80

90

100

300 400 500 600 700 800 900 1000 1100

Wavelength (nm)

Qua

ntum

Eff

icie

ncy

(%)

Thinned CCD LBNL high resistivity

Science goal: z=1 ~50% of time in z-filter LBNL full depletion CCDs

QE> 50% at 1000 nm 250 microns thick high resistivity silicon

Science goal: image 5000 sq-degrees deeply 62 CCDs 3 sq-degree footprint 1 Gig/image

Page 60: The SDSS, The Dark Energy Survey and Large Scale Sky Surveys Data Intensive Experimental Science James Annis Fermilab.

Dec 3, 2004 61James Annis - Fermilab

We can do that…

Ok, the folks of SiDet know how to do that…

Page 61: The SDSS, The Dark Energy Survey and Large Scale Sky Surveys Data Intensive Experimental Science James Annis Fermilab.

Dec 3, 2004 62James Annis - Fermilab

DARK ENERGY

1. The Cosmological Constant Problem Particle physics theory currently provides no understanding of why

the vacuum energy density is so small: DE (Theory) /DE (obs) = 10120

2. The Cosmic Coincidence ProblemTheory provides no understanding of why the Dark Energy density

is just now comparable to the matter density.

3. What is it?Is dark energy the vacuum energy? a new, ultra-light particle? a

breakdown of General Relativity on large scales? Evidence for extra dimensions?

The nature of the Dark Energy is one of the outstanding unsolved problems of fundamental physics. This is an observational driven field, and progress requires more precise probes of Dark Energy.

Page 62: The SDSS, The Dark Energy Survey and Large Scale Sky Surveys Data Intensive Experimental Science James Annis Fermilab.

Dec 3, 2004 63James Annis - Fermilab

Globus Data Grid Components

Metadata Catalog

Replica Catalog

Tape Library

Disk Cache

Attribute Specification

Logical Collection and Logical File Name

Disk Array Disk Cache

Application

Replica Selection

Multiple Locations

NWS

SelectedReplica

GridFTP commands PerformanceInformation &Predictions

Replica Location 1 Replica Location 2 Replica Location 3

MDS

Page 63: The SDSS, The Dark Energy Survey and Large Scale Sky Surveys Data Intensive Experimental Science James Annis Fermilab.

Dec 3, 2004 64James Annis - Fermilab

Astrophysical based probability map: a cluster is a BCG and E/S0 ridgeline

L cluster = L BCG + L E/S0

L BCG = -( mm)2

+ (g-rg-r2

+r-ir-i ) 2 )

L E/S0 = ln Ngals

At end, for each galaxy, ask whether it is the highest L cluster inside a metric radius, usually 1 Mpc.

Likelihood is the sum of the deviations from population mean BCG: magnitude

The maxBcg Algorithm:Astrophysically Based

and r-i color in units of sigma

g-r color

Likelihood of cluster is that of a BCG plus E/S0 ridgeline

Likelihood is proportional to the natural log of the number of galaxies

Page 64: The SDSS, The Dark Energy Survey and Large Scale Sky Surveys Data Intensive Experimental Science James Annis Fermilab.

Dec 3, 2004 65James Annis - Fermilab

Elements: A mass function useful for any reasonable cosmology

The Jenkins mass function, derived from simulations

ρm(z) = ρm(0) (1+z)3

Dz is D(z,Ωm,w) is the growth function for linear perturbations. σm is the rms amplitude of mass fluctuations inside spherically symmetric top

hat filter of radius R; And n is the number density of clusters per dlog(mass)

A mass calibration weak lensing calibration of mass gives N ~M ½

A way to calculate σm; CMBFast or Eisenstein and Hu.

Number Function Prediction

power spectra

cosmology

8.3)ln(61.0

)log(

1)(315.0

)log(MzDM

M

m eMd

d

M

z

Md

dn

Page 65: The SDSS, The Dark Energy Survey and Large Scale Sky Surveys Data Intensive Experimental Science James Annis Fermilab.

Dec 3, 2004 66James Annis - Fermilab

DES Cluster Constraints

DES high mass cluster sample

Uses: Cluster redshift distribution Cluster angular power spectrum 100 clusters with 30% mass estimates in

0.3<z<1.2 Fiducial cosmology is WMAP

Dashed line is non-flat models Solid line is flat models