VOTable: Tabular Data for Virtual Observatory François Ochsenbein Roy Williams Clive Davenhall,...

21
VOTable: Tabular Data for Virtual Observatory François Ochsenbein Roy Williams Clive Davenhall, Daniel Durand, Pierre Fernique, Robert Hanisch, David Giaretta, Tom McGlynn, Alex Szalay, Andreas Wicenec

Transcript of VOTable: Tabular Data for Virtual Observatory François Ochsenbein Roy Williams Clive Davenhall,...

Page 1: VOTable: Tabular Data for Virtual Observatory François Ochsenbein Roy Williams Clive Davenhall, Daniel Durand, Pierre Fernique, Robert Hanisch, David Giaretta,

VOTable:Tabular Data for Virtual Observatory

François OchsenbeinRoy Williams

Clive Davenhall, Daniel Durand, Pierre Fernique, Robert Hanisch, David Giaretta, Tom McGlynn, Alex Szalay,

Andreas Wicenec

Page 2: VOTable: Tabular Data for Virtual Observatory François Ochsenbein Roy Williams Clive Davenhall, Daniel Durand, Pierre Fernique, Robert Hanisch, David Giaretta,

The Context

Need of exchanging data in tabular form:• Coming from a wide variety of data servers

and archives (VO context)• Must include the associated metadata in

order to be interpretable by applications• Must deal with potentially millions of

records• Existence of FITS

Page 3: VOTable: Tabular Data for Virtual Observatory François Ochsenbein Roy Williams Clive Davenhall, Daniel Durand, Pierre Fernique, Robert Hanisch, David Giaretta,

VOTable History

• Astrores at CDS/ESO (June 1999)• XSIL at Caltech (June 2000)• October 2001: first discussions• December 2001: VOTable 0.1 • January 2002: Interoperability meeting Strasbourg• 15 April 2002: VOTable 1.0

http://cdsweb.u-strasbg.fr/doc/VOTable/

VOTable archives & discussion groups:

http://archives.us-vo.org/VOTable/

Page 4: VOTable: Tabular Data for Virtual Observatory François Ochsenbein Roy Williams Clive Davenhall, Daniel Durand, Pierre Fernique, Robert Hanisch, David Giaretta,
Page 5: VOTable: Tabular Data for Virtual Observatory François Ochsenbein Roy Williams Clive Davenhall, Daniel Durand, Pierre Fernique, Robert Hanisch, David Giaretta,

Why XML ?

• includes in a single document the data and their associated metadata (descriptive data)

• is of common usage since ~ 3 years

• can be interpreted parsers and tools readily available

• can be visualized (XSL)

• can be encapsulated in messages

Page 6: VOTable: Tabular Data for Virtual Observatory François Ochsenbein Roy Williams Clive Davenhall, Daniel Durand, Pierre Fernique, Robert Hanisch, David Giaretta,

A “classical” XML Document

<?xml version="1.0"?><!DOCTYPE VOTABLE SYSTEM "http://us-vo.org/xml/…...dtd"> <RESOURCE name="myResource"> <OBSERVER>William Herschel</OBSERVER> <SOURCE id="mySource"> <STAR-NAME>Procyon</STAR-NAME> <POSITION equinox="J2000" epoch="J2000"> <RA unit="deg">114.827</RA> <Dec unit="deg">+05.227</Dec> </POSITION> <COUNTS> <COUNT>4</COUNT> <COUNT>5</COUNT> <COUNT>3</COUNT> </COUNTS> </SOURCE> ….. </RESOURCE>

Page 7: VOTable: Tabular Data for Virtual Observatory François Ochsenbein Roy Williams Clive Davenhall, Daniel Durand, Pierre Fernique, Robert Hanisch, David Giaretta,

Problems of “classical” XML Documents

Each data element is <tagged>, meaning:

• Huge overheads in terms of volume, required resources, and processing time

Not adapted to multi-million row tables

• Need to introduce new elements (tags) for each new parameter, or to cross-match a potentially large set of name spaces

Page 8: VOTable: Tabular Data for Virtual Observatory François Ochsenbein Roy Williams Clive Davenhall, Daniel Durand, Pierre Fernique, Robert Hanisch, David Giaretta,

The VOTable way

• The metadata part (data description), essentially as a set of <FIELD> and <PARAMETER> specifications

• The data part (serialisation), which may be in XML, FITS or binary.

VOTables follow the classical tabular presentation where the columns are assumed to be homogeneous in terms of their associated metadata; a VOTable document contains:

Page 9: VOTable: Tabular Data for Virtual Observatory François Ochsenbein Roy Williams Clive Davenhall, Daniel Durand, Pierre Fernique, Robert Hanisch, David Giaretta,

<?xml version="1.0"?><!DOCTYPE VOTABLE SYSTEM "http://us-vo.org/xml/VOTable.dtd"><VOTABLE version="1.0"> <DEFINITIONS> <COOSYS ID="myJ2000" equinox="2000." epoch="2000." system="eq_FK5"/> </DEFINITIONS> <RESOURCE> <PARAM name="Observer" datatype="char" arraysize="*" value="William Herschel"> <DESCRIPTION>This parameter is designed to store the observer's name </DESCRIPTION> </PARAM> <TABLE name="Stars"> <DESCRIPTION>Some bright stars</DESCRIPTION> <FIELD name="Star-Name" ucd="ID_MAIN" datatype="char" arraysize="10"/> <FIELD name="RA" ucd="POS_EQ_RA" ref="myJ2000" unit="deg" datatype="float" precision="F3" width="7"/> <FIELD name="Dec" ucd="POS_EQ_DEC" ref="myJ2000" unit="deg" datatype="float" precision="F3" width="7"/> <FIELD name="Counts" ucd="NUMBER" datatype="int" arraysize="2x3x*"/> <DATA> <TABLEDATA> <TR> <TD>Procyon</TD><TD>114.827</TD><TD> 5.227</TD> <TD>4 5 3 4 3 2 1 2 3 3 5 6</TD> </TR> <TR> <TD>Vega</TD><TD>279.234</TD> <TD>38.782</TD><TD>8 7 8 6 8 6</TD> </TR> </TABLEDATA> </DATA> </TABLE> </RESOURCE></VOTABLE>

Page 10: VOTable: Tabular Data for Virtual Observatory François Ochsenbein Roy Williams Clive Davenhall, Daniel Durand, Pierre Fernique, Robert Hanisch, David Giaretta,

<RESOURCE> <PARAM …/> … <TABLE> <FIELD…/>… <DATA>

<TABLEDATA> <TR> <TD>… </TR> …</TABLEDATA>

<FITS extnum="n "> <STREAM …></FITS>

<BINARY> <STREAM …></BINARY>

</DATA> </TABLE></RESOURCE>

Page 11: VOTable: Tabular Data for Virtual Observatory François Ochsenbein Roy Williams Clive Davenhall, Daniel Durand, Pierre Fernique, Robert Hanisch, David Giaretta,
Page 12: VOTable: Tabular Data for Virtual Observatory François Ochsenbein Roy Williams Clive Davenhall, Daniel Durand, Pierre Fernique, Robert Hanisch, David Giaretta,

The <FIELD> and <PARAMETER>

name column label

unit standardized unit

datatype computer type

width character representation

precision character representation

Arraysize repetition factor

ucd standardized parameter category

Describe the metadata attached to columns <FIELD>or to the resource <PARAMETER>

Page 13: VOTable: Tabular Data for Virtual Observatory François Ochsenbein Roy Williams Clive Davenhall, Daniel Durand, Pierre Fernique, Robert Hanisch, David Giaretta,

The UCDs

• Interpretation of the table contents• Decide whether values can be compared• Data mining

S. Derrière's talk on Friday

Unified Content Descriptor

Categorisation of the parameters listed in the table

Page 14: VOTable: Tabular Data for Virtual Observatory François Ochsenbein Roy Williams Clive Davenhall, Daniel Durand, Pierre Fernique, Robert Hanisch, David Giaretta,

datatype Meaning FITS Bytes

"boolean" Logical L 1

"bit" Bit X *

"unsignedByte" Byte (0 to 255) B 1

"short" Short Integer I 2

"int" Integer J 4

"long" Long integer K 8

"char" ASCII Character A 1

"unicodeChar" Unicode Character

2

"float" Floating point E 4

"double" Double D 8

"floatComplex" Float Complex C 8 "doubleComplex

"Double Complex M 16

Page 15: VOTable: Tabular Data for Virtual Observatory François Ochsenbein Roy Williams Clive Davenhall, Daniel Durand, Pierre Fernique, Robert Hanisch, David Giaretta,

FITS Compatibility

• Compatible data types• FITS keywords are represented as <FIELD>,

e.g. width precision arraysize

• Array and variable-length arrays• <DATA> may link to existing FITS data sets

VOTable was designed to be compatible with existing FITS data tables

Page 16: VOTable: Tabular Data for Virtual Observatory François Ochsenbein Roy Williams Clive Davenhall, Daniel Durand, Pierre Fernique, Robert Hanisch, David Giaretta,

Data SerializationFITS or BINARY data may be embedded in thedocument, or remote; compression/encodingmay be applied.

Page 17: VOTable: Tabular Data for Virtual Observatory François Ochsenbein Roy Williams Clive Davenhall, Daniel Durand, Pierre Fernique, Robert Hanisch, David Giaretta,

Existing tools and Servers

• Several databases are delivering VOTables: HEASARC IPAC NOAO NRAO VizieR SIMBAD (cone search >50 services)

• VOTable parsers in Perl, Java, C (different types of parsers for different applications)

• VOTable validators

• XSLT basic XML/HTML translators

Page 18: VOTable: Tabular Data for Virtual Observatory François Ochsenbein Roy Williams Clive Davenhall, Daniel Durand, Pierre Fernique, Robert Hanisch, David Giaretta,

DTD or XML-Schema

• The VOTable rules are existing as a DTD (Document-Type Definition) and in the XML-Schema language (heavily used in developping WebServices applications)

Page 19: VOTable: Tabular Data for Virtual Observatory François Ochsenbein Roy Williams Clive Davenhall, Daniel Durand, Pierre Fernique, Robert Hanisch, David Giaretta,

VOTable appendices

1.The LINK conventions describing how to get the correlated data (explanations, images, spectra…) based on substitution of the column contents

Astrores had two features not implemented in VOTables:

…<FIELD name="FileName" datatype="char"…/>…<LINK href="http://server/getFile?${FileName}" …/>…<TR> … <TD>photo/procyon.dat</TD>… </TR><TR> … <TD>photo/vega.dat</TD>… </TR>

Page 20: VOTable: Tabular Data for Virtual Observatory François Ochsenbein Roy Williams Clive Davenhall, Daniel Durand, Pierre Fernique, Robert Hanisch, David Giaretta,

VOTable appendices (2)

2. The Query Mechanism using conventions similar to the HTML <FORM> for retrieving the data from user-supplied constraints

<PARAM name="Observer" datatype="char" arraysize="*" /> <TABLE name="Stars"> <DESCRIPTION>Some bright

stars</DESCRIPTION> <FIELD name="Star-Name" ucd="ID_MAIN" datatype="char"

arraysize="10"/> <FIELD name="RA" ucd="POS_EQ_RA" ref="myJ2000" unit="deg"

datatype="float" precision="F3" width="7"/> <FIELD name="Dec" ucd="POS_EQ_DEC" ref="myJ2000" unit="deg"

datatype="float" precision="F3" width="7"/> <FIELD name="Counts" ucd="NUMBER" datatype="int"

arraysize="2x3x*"/> <LINK type="query" action="http://server-node/getResult?" /> </TABLE>

toward more generic WDSL-like solutions ?

Page 21: VOTable: Tabular Data for Virtual Observatory François Ochsenbein Roy Williams Clive Davenhall, Daniel Durand, Pierre Fernique, Robert Hanisch, David Giaretta,

Conclusions

• Just version 1.0 … more to come

• Comments ? Proposals ?

Join the discussion group

[email protected]