Status of LOFAR Data in HDF5 Format · 2011-11-23 · IDL, VisIt, HDFView, h5py, PyTables, DAL...
Transcript of Status of LOFAR Data in HDF5 Format · 2011-11-23 · IDL, VisIt, HDFView, h5py, PyTables, DAL...
Abstract
TheLowFrequencyArray(LOFAR)projectissolvingthechallengeofdatasizeandcomplexityusingtheHierarchicalDataFormat,version5(HDF5).MostofLOFAR'sstandarddataproductswillbestoredusingtheHDF5format;theBeam‐Formed(akaPulsar)andTransientBufferBoard(TBB)TimeSeriesDatahavetransiQonedfromproject‐specificbinarytoHDF5format.Itsupportsanunlimitedvarietyofdatatypes,andisdesignedforflexibleandefficientI/Oandforhighvolumeandcomplexdata. HDF5isportable,extensible,andparallelizable,allowingapplicaQonstoevolveintheiruseofHDF5.WereportonourefforttopavethewaytowardsnewastronomicaldataencapsulaQonusingHDF5,whichcanbeusedbyfuturegroundandspaceprojects.TheLOFARprojecthasformedacollaboraQonwithNROA,theVOAandtheHDFGrouptoobtainfundingforafull‐QmestaffmembertoworkondocumenQnganddevelopingstandardsforastronomicaldatawri[eninHDF5.WehopeoureffortwillenhanceHDF5visibilityandusagewithinthecommunity,specificallyforSKAPathfindersandothermajornewradiotelescopessuchasLOFAR,EVLA,ALMA,ASKAP,MeerKAT,MWA,LWA,andeMERLIN.
TheLOFARRadioTelescope
WWW.LOFAR.ORG
LOFARDatainHDF5:SolvesVariety,Complexity&SizeIssuesTimeSeriesdatasharesimilarissuestoalldatasetsproducedbyLOFARobservaQons‐‐theyvarytremendouslyintype,sizeandcomplexity.InsteadofusingmanydifferentfileformatstoencapsulateLOFARdata,theHierarchicalDataFormat,version5(HDF5)waschosenasaviablesoluQonforpotenQallymassiveLOFARdataproducts.
TheLOFAR“Superterp”,Exloo,NL(Fig.1)
LOFARdatastaQsQcs:• Dataratesupto8GB/sec• Filesizes,10sto100sTB• Datasetsrangingfrom1upto6dimensions
• SingledatasetcanbespreadovermulQpledisks
• DatawriQngmustbemulQ‐threadedandparallelizable
• Mostastronomicaldatacontainers(CASAMSorFITS,or~30pulsardataformats)couldnotmeetallourneeds
• The“LOwFrequencyARray”(LOFAR)• Currently,thelargestradiotelescopeintheworld• 48staQons(40NL,8InternaQonal);completebyendof2011(Figure1onrightshowsthecenterstaBonsinNL)• Baselinesfrom1‐1500km;ulQmatelyachievesub‐arcsecondresoluQonovermuchoftheband• LowBandAntennabandpass:30‐80MHz• HighBandAntennabandpass:120‐240MHz• TotalRecordableBandwidth:48MHz;100MHzTBBs• DataCorrelaQon:IBMBlueGene/Psupercomputer,Groningen,NL(seephotoinFigure2onright)• Offlineprocessingclusterhas100nodes,eachwith:24cores,64GBRAM,21TBstorage• LongTermArchive(LTA)has:2.2PBdisk,5PBtape• Accessto22,600coresviaBigGridandJUROPA
LOFARBeam‐FormedTimeSeriesDataFormatSpecificaQonsinHDF5
1 DocumentsavailablefromtheLOFARWiki(“DataProducts”link),h<p://lus.lofar.org/wiki2 UVVisibilityandNearFieldImagingareunderdevelopment
EachLOFARdipoleantennahasaringbuffer,transientbufferboard,thatcanstoreupto1.3secondsofrawQmeseriesdatasampledevery5ns(LOFAR’shighestQmeresoluQon).LOFARactsasa“Qmemachine”–atransienteventtriggersthebuffereddatatobewri[entoHDF5,providingupto1.3secondsofhistoricalinformaQonatfulltemporalandspaQalresoluQon.
TBB’s are used to study: air showers producedby cosmic rays, lightning onEarthandotherplanetarybodieswithinourownsolarsystem,andunknown
sub‐secondQmescaleastronomicaltransients.
TransientBufferBoard(TBB)Data:look‐backintothe“past” Summary,CollaboraQonsAndFutureConsideraQons
HDF5‐friendlyTools:IDL,VisIt,HDFView, h5py,PyTables,DAL(LOFAR),PyDAL(LOFAR)
LOFARhasstartedworkonthenextgeneraQonof itsC++DataAccessLibrary(DAL+PyDAL),basedoncurrentQmeseriesdatausecasesandusepa[erns,withthegoalofspeed‐up,simplificaQonofuseandbe[erpythonbindings. WealsointendtoexpandtheDALforusewithaddiQonalLOFARdataformatsandtobeablewriteconvertersto/from FITS, the Casa TablesMeasurement Sets and HDF5. Improvements coming in2012!
LOFARhasteamedupwithNRAO,theIVOAandTheHDFGroupinrequesQngfundingfordefininganddevelopingastronomicaldatastandardsinHDF5,basedonLOFARsICDandDALgroundwork.Thiseffortisnicknamed“AstroHDF”.Futureworkalsoinvolvesdeveloping converts to/from HDF5/FITS and collaboraQons on interfaces inDS9 andVisItforHDF5(WCS‐compliant)astronomicaldata.
We believe that the HDF5 format is well‐suited for complex and large astronomicaldata.ForsomeofLOFAR’sdataformatchallenges,HDF5istheonlyviablesoluQon.InHDF5,LOFARcaneasilymap10sofTBofcomplex,6‐dimensionaldatastructureswithintricateheaderandlogginginformaQonthroughoutthedata‐files. WefeelthatHDF5canmeet thedata needs anddemandsof futureprojects such as LSST and the SKA,wheresizeandcomplexitywillbeevengreaterofanissuethanfacedbyLOFAR.
TheBlueGene/PsupercomputeratGroningen(Fig.2)
LOFARInterfaceControlDocuments1(ICDs)providedetaileddescripQonsofallexpectedLOFARDataProductsinHDF5format:
• Beam‐Formed(BF)Data[shownontheleF]• TransientBufferBoard(TBB)TimeSeries[bo<omleFofposter]• RadioSkyImageCubes• DynamicSpectrumData• RotaRonMeasure(RM)SynthesisCubes• UVVisibilityData2• Near‐fieldImages2
ElementBeam
Sub‐ArrayPoinRng
LOFARStaRons
Tied‐ArrayBeams
DATA(I)
TIME
FREQUENCY
DATA(Q)
TIME
FREQUENCY
DATA(U)
TIME
FREQUENCY
DATA(V)
TIME
FREQUENCY
BinaryDataContainer
HDF5Container:Header&DataStructureLayout
Header+Data==
HDF5“File”(BeamFormed)
LOFARDataAccessLibrary(DAL):C++andPythonDataI/O
LinuxCluster“Offline”KnownPulsarPipeline
ConvertersteptoberemovedwhenDALlayeriscompleted.
Data
BlueGene/P“Online”Tied‐ArrayBeamPipeline
FortheLOFARproject,adataproductisdefinedwithinthecontextofHDF5.HDF5allowsforstorage,notonlyofthedata, but for the associated and related meta‐datadescribingthedata’scontents,condiQonsofobservaQons,logs, etc.. As an “all‐in‐one” wrapper, the HDF5 formatsimplifiesthemanagementofcomplexdatasets.DescribesLayout
LOFAR Data Format ICDBeam-Formed DataDocument ID: LOFAR-USG-ICD-003.tex
ISSUE 2.04.08
SVN Repository Revision: 8333
A. Alexov, K. Anderson, L. Bahren, J.-M. Grießmeier, J.W.T. Hessels,
J.S. Masters, B.W. Stappers
SVN Date: 2011-09-02
Contents
Change record 3
1. Introduction 81.1. Purpose and Scope . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81.2. Context and Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81.3. Applicable documents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2. Overview 9
3. Organization of the data 103.1. High level LOFAR BF file structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103.2. Overview of BF Groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103.3. Hierarchical Structure of the BF H5 File . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
4. Detailed Data Specification 144.1. The Root Group (ROOT) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
4.1.1. Common LOFAR Attributes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154.1.2. Additional Beam-Formed Root Attributes . . . . . . . . . . . . . . . . . . . . . . . . . 18
4.2. The System Logs Group (SYS LOG) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184.3. The Sub-Array Pointing Group (SUB ARRAY POINTING {NNN}) . . . . . . . . . . . . . . . . . . 184.4. The Processing History Group (PROCESS HISTORY) . . . . . . . . . . . . . . . . . . . . . . . . 184.5. The Beam Group (BEAM {NNN}) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224.6. The Coordinates Group (COORDINATES) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
4.6.1. The Time coordinate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 244.6.2. The Spectral coordinate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
4.7. The Stokes Datasets (STOKES {N}) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 274.8. The Subband/Channel tables/arrays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 274.9. TiedArray Observing Sub-Modes & Data Format . . . . . . . . . . . . . . . . . . . . . . . . . 28
4.9.1. Standard Observing Mode for TiedArray . . . . . . . . . . . . . . . . . . . . . . . . . 284.9.2. Sub-Observing Modes for TiedArray . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
1
WhyuseHDF5?• FormatformanagingandstoringlargeandcomplexscienQficdata• Unlimitedvarietyofdatatypes• NoinherentfilesizelimitaQons• ParallelreadingandwriQng(MPI)• Runsonmassivelyparallel/distributedsystems• Built‐incompression• LibraryAPIinC/C++/Java/Fortran• Self‐describingandportable• Free&has20+yearhistory@NASA
DefinesSWSpecificaQon
StatusofLOFARDatainHDF5FormatAnastasiaAlexov1,2,PimSchellart3,SanderterVeen3andtheLOFARICDGroup
1:UniversityofAmsterdam(UvA)[[email protected]],2:NetherlandsInsQtuteforRadioAstronomy(ASTRON);3:RadboudUniversityNijmegen
StaQonCorrelator
TransientBufferBoardRingBuffer,1.3s
LOFARantennasignal
LOFARCorrelator
ImagesBeam‐FormedData(HDF5)
TimeSeriesDataWriter(HDF5)
PipelineSoxware(HDF5)
A/DConverter
48MHz
100MHz
LOFARDataAccessLibrary(DAL):C++andPythonDataI/O
LOFARTBBDataFlow
Hearmoreat:HDF5BoFWedNov9th@17:15
LOFARprojectisdevelopingtheC++DataAccessLibrary(DAL),whichprovidesfullscopeconstructorsforcreaQngandaccessingLOFARdataproducts,usingtheICDspecificaQons.TBBandBeam‐Formedhavebeenimplemented.
CrabGiantPulse(@center),asseenbytheTBBboardsat0.015secsamplingBme
CollaboraQvemailinglistinfo:nextgen‐[email protected]:[email protected]/messagebody“subscribednextgen‐astrodata”
TBB HDF5 Data Stats: • 100 MHz of bandwidth • 1.3 sec of look-back time • 40MB - 48GB dataset/station • 2GB - 2.25TB dataset sizes • Data maps easily to HDF5 • 2012 Upgrade: 5 sec look-back • 2012 Upgrade: 8.6 TB dataset
AST
RA
U
NIJ
E
NUI
R
O
M
T
T
D
EN
HY
SIC
S
DBO
D
VER
GN
EM
PEA
S TYI
P
R
FO
www.hdfgroup.org
Hearmoreat:HDF5BoF
WedNov9th@17:15