Spatial Indexing of large astronomical databases László Dobos, István Csabai, Márton Trencséni...

16
Spatial Indexing of large astronomical databases László Dobos, István Csabai, Márton Trencséni ELTE, Hungary

Transcript of Spatial Indexing of large astronomical databases László Dobos, István Csabai, Márton Trencséni...

Spatial Indexing of large astronomical databases

László Dobos,István Csabai,

Márton TrencséniELTE, Hungary

Typical datasets

~200M data points

multidimensional parameter spacereal space, magnitudes, colors, redshift

etc.

over several hundred GBs

Typical tasks

Object types are classified by a set of linear inequlities in magnitude space

(n dimensional polihedra)Compute a histogram of the whole

parameter spaceFind similar objectsFind clustersCompare the distribution of two very large

datasets

Database servers

Ideal for storing a large amount of data even when data structure is not so complex

Optimized access of data over file systemsClever caching methodsMSSQL 2005 can be programmed

efficiently

Problems

In DB servers evaluating data points against inequalities is done on a per row basis

Traditionally points close in the magnitude space may be far on the disk(s)

when the expected resultset is small, it is not optimal to run a tablescan

Main idea

divide the parameter space into small cells two main methods:

hierarchical: kd-tree, modified kd-tree etcadaptive: Voronoi tessalation

section cells with search polihedra instead of check every point

check on a per point basis only when needed

Voronoi-Dealunay tessalation

kd-Trees

Steps for creating the tessalation

1. For adaptive (like Voronoi):choose starting points (randomly?)

2. Calculate the cells, store in the database

3. Lookup the cell of each data point

4. Create a DB index on the cell IDorders dataset on the disk

5. Run queries

Technical details

done in MSSQL using the new SQL CLR features

allows running programs within the process of the database server: very fast!

linear programming, Voronoi etc. libraries are ported to .net/c#

Preliminary results

BoxTree vs. Standard SQL query times

0

10000

20000

30000

40000

50000

60000

70000

80000

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35

# returned rows / # total rows

qu

ery

tim

e [m

sec]

BoxTree durations

SQL durations

Scientific ideas

SDSS photometry – 5D – 300M pointsfinding all objects with similar colorssource classificationstar – quasar separationblue – red galaxy locus etc.

Karhunen-Loeve (PCA) coeffs of Bruzual-Charlot models – 5-15D – 100K-100M p

Quick match with observed spectra

Scientific ideas cont.

Magnitudes of spectral synthesis models – 5-10D – 100K-100M points

match with observationsphoto-zphysical props. from photometrycheck consitency of various models

(BC-GRASIL)Multiresolution visualization of large

number of points