Scientific File Formats · Introduction Commonalities Examples Final Notes ... NetCDF4: NetCDF API...
Transcript of Scientific File Formats · Introduction Commonalities Examples Final Notes ... NetCDF4: NetCDF API...
IntroductionCommonalities
ExamplesFinal Notes
Scientific File Formats
Daniel L. Wang
SLAC
6 October 2010
Daniel L. Wang Scientific File Formats
IntroductionCommonalities
ExamplesFinal Notes
1 Introduction
2 Commonalities
3 ExamplesFITSXTCROOT I/ONetCDFHDF5Others
4 Final Notes
Daniel L. Wang Scientific File Formats
IntroductionCommonalities
ExamplesFinal Notes
Introduction: why files?
Files contain many (most?)scientific data
Files last for a long time
Explosion in data→more,bigger files
Figure: Magnetic tape drive http://www.flickr.
com/photos/laughingsquid/102689398/
Daniel L. Wang Scientific File Formats
IntroductionCommonalities
ExamplesFinal Notes
Scientific File Access
Simple, non-transactional
Data access: value lookups, statistics, plotting,
Transformations: simple math + complex algorithms
Logging/history: coarse
“Image C is the result of function(Image A, parameter set B)”not “$X was deducted from Account Y and added to AccountZ”
Longevity: >10, 50 or 100+ years
Daniel L. Wang Scientific File Formats
IntroductionCommonalities
ExamplesFinal Notes
Themes and Commonalities in Formats
Sequential or random-access navigation
Storage efficiency
Self-description (i.e., metadata)
Ordering: Object sequences, grids, images, N-D arrays
(+ tables)
Write-append (non-update)
Machine portability (FP format, byte order)
Standards
Daniel L. Wang Scientific File Formats
IntroductionCommonalities
ExamplesFinal Notes
FITSXTCROOT I/ONetCDFHDF5Others
FITS
Flexible Image Transport System. Standard astronomical dataformat (NASA/IAU)
Generic N-D arrays, images, and ASCII or binary tables
Image tile-compression
Not random access
Human-readable header
See also: AipsIO (used by casacore)
Daniel L. Wang Scientific File Formats
IntroductionCommonalities
ExamplesFinal Notes
FITSXTCROOT I/ONetCDFHDF5Others
Primary HDU
Extension HDU
Extension HDU
...
HDU type Contents
Primary ASCII header + n-D array
Image* ASCII header w/imagemetadata + n-D array
ASCII Table* ASCII header w/tablemetadata + fixed-widthrow data
Binary Table* ASCII header w/tablemetadata + 2-D array
* Extension HDU
Daniel L. Wang Scientific File Formats
IntroductionCommonalities
ExamplesFinal Notes
FITSXTCROOT I/ONetCDFHDF5Others
XTC
eXtended Tagged Container (HEP, photon science)
Object serialization: Vectors, trajectories, events, detections
Not random-access
No compression
Lightweight, streaming
Daniel L. Wang Scientific File Formats
IntroductionCommonalities
ExamplesFinal Notes
FITSXTCROOT I/ONetCDFHDF5Others
Datagram
Datagram
...
Datagram type Notes
Sequence Event transitions (e.g.,{Begin,End}Run,L1Accept)
Xtc User data objects
Env
More information:https://confluence.slac.stanford.edu/display/PCDS/XTC+format
Daniel L. Wang Scientific File Formats
IntroductionCommonalities
ExamplesFinal Notes
FITSXTCROOT I/ONetCDFHDF5Others
ROOT I/O
ROOT Object I/O (HEP)
Object serialization
Tree-structured (similar to fs)
Object deletion
Compression (deflate)
Ranged values
See also: LCIO
Daniel L. Wang Scientific File Formats
IntroductionCommonalities
ExamplesFinal Notes
FITSXTCROOT I/ONetCDFHDF5Others
Figure: From [Brun and Rademakers, 1996]
Daniel L. Wang Scientific File Formats
IntroductionCommonalities
ExamplesFinal Notes
FITSXTCROOT I/ONetCDFHDF5Others
NetCDF
Network Common Data Form. (Geo)
N-dimensional arrays
Arrays appendable in one dimension (v4: or more)
Named dimensions with explicit coordinates (allow irregularspacing)
Machine portable
Slabbed, sliced, random access
NetCDF4: NetCDF API on HDF5 physical structure
2000-2006 JJA wind power, courtesy Scott Capps
Daniel L. Wang Scientific File Formats
IntroductionCommonalities
ExamplesFinal Notes
FITSXTCROOT I/ONetCDFHDF5Others
Section Notes
Header magic, #records, dimension,global attr, variable meta
Non-record data fixed-size variables, incl fixed-dimensions
Record data record variables (incl. record di-mensions
From [Rew et al., 1997]
Daniel L. Wang Scientific File Formats
IntroductionCommonalities
ExamplesFinal Notes
FITSXTCROOT I/ONetCDFHDF5Others
HDF5
Hierarchical Data Format
N-D arrays
Array nesting via pointers+VL datatypes
Compression
Parallel I/O (MPI I/O)
Flexible, chunked data layout (slabs or custom tiles)
Ragged arrays via variable-length datatypes
fs-like, w/symlinks, heaps, freelists,
Up to 255 byte offsets
Daniel L. Wang Scientific File Formats
IntroductionCommonalities
ExamplesFinal Notes
FITSXTCROOT I/ONetCDFHDF5Others
Figure: From [HDF Group, 2010]
Daniel L. Wang Scientific File Formats
IntroductionCommonalities
ExamplesFinal Notes
FITSXTCROOT I/ONetCDFHDF5Others
Many more formats!
Irregular/non-rectangulargrids (e.g., geodesic, radial,etc.) [Wadsley and Shell Internationale
Petroleum, 1980] , triangularmeshes [Gorski et al., 2005]
See also Scientific DataFormat FAQ [Stern, 1995] Figure: Hexagonal mesh http:
//www.flickr.com/photos/danhorst/819469908/
From HEALPix: http://healpix.jpl.nasa.gov
Daniel L. Wang Scientific File Formats
IntroductionCommonalities
ExamplesFinal Notes
Final notes
Data > formats
Long-lived formats,long-lived software
Figure: Looking for data in files. http:
//www.flickr.com/photos/mahmood/4616170423/Daniel L. Wang Scientific File Formats
IntroductionCommonalities
ExamplesFinal Notes
ReferencesBrun, R. and Rademakers, F. (1996).
ROOT Object I/O System.http://www.hdfgroup.org/HDF5/doc/H5.format.html.
Gorski, K., Hivon, E., Banday, A., Wandelt, B., Hansen, F., Reinecke, M., and Bartelmann, M. (2005).
HEALPix: A framework for high-resolution discretization and fast analysis of data distributed on the sphere.The Astrophysical Journal, 622:759.
HDF Group (2010).
HDF5 file format specification version 2.0.http://www.hdfgroup.org/HDF5/doc/H5.format.html.
Rew, R., Davis, G., Emmerson, S., and Davies, H. (1997).
NetCDF user’s guide for C.Unidata Program Center.
Stern, I. (1995).
Scientific Data Format Information FAQ.
Wadsley, W. A. and Shell Internationale Petroleum (1980).
Modelling reservoir geometry with non-rectangular coordinate grids.In SPE Annual Technical Conference and Exhibition, Dallas, Texas. American Institute of Mining,Metallurgical, and Petroleum Engineers.
Daniel L. Wang Scientific File Formats