Introduction to HDF5

57
11/6/07 HDF and HDF-EOS Workshop XI, Landover, MD 1 Introduction to HDF5 Introduction to HDF5 HDF and HDF-EOS Workshop XI November 6-8, 2007

description

This tutorial is designed for new HDF5 users. We will cover basic HDF5 Data Model objects and their properties, give an overview of the HDF5 Libraries and APIs, and discuss the HDF5 programming model. Simple C and Fortran examples will be used to illustrate HDF5 concepts.

Transcript of Introduction to HDF5

Page 1: Introduction to HDF5

11/6/07 HDF and HDF-EOS Workshop XI, Landover, MD

1

Introduction to HDF5Introduction to HDF5

HDF and HDF-EOS Workshop XINovember 6-8, 2007

Page 2: Introduction to HDF5

11/6/07 HDF and HDF-EOS Workshop XI, Landover, MD

2

Goals

• Introduce HDF5

• Explain how data can be organized and used in an application

• Provide example code

Page 3: Introduction to HDF5

11/6/07 HDF and HDF-EOS Workshop XI, Landover, MD

3

For More Information…

All workshop slides will be available from:

http://hdfeos.org/workshops/ws11/workshop_eleve

n.php

See the Resources handout for where to get software,

Docs, FAQs, etc..

Page 4: Introduction to HDF5

11/6/07 HDF and HDF-EOS Workshop XI, Landover, MD

4

What is HDF5?

HDF = Hierarchical Data Format

• File format for managing any kind of data

• Software (library and tools) for accessing data in that format

Page 5: Introduction to HDF5

11/6/07 HDF and HDF-EOS Workshop XI, Landover, MD

5

HDF5 Features

• Especially suited for large and/or complex data collections.

• Platform independent

• C, F90, C++ , Java APIs

Page 6: Introduction to HDF5

11/6/07 HDF and HDF-EOS Workshop XI, Landover, MD

6

Diagram Definitions

= Group

= Dataset

Page 7: Introduction to HDF5

11/6/07 HDF and HDF-EOS Workshop XI, Landover, MD

7

Example HDF5 file

“/” (root)

“/foo”

Raster imageRaster image

palettepalette

3-D array3-D array

2-D array2-D arrayRaster imageRaster image

lat | lon | temp----|-----|----- 12 | 23 | 3.1 15 | 24 | 4.2 17 | 21 | 3.6

TableTable

Page 8: Introduction to HDF5

11/6/07 HDF and HDF-EOS Workshop XI, Landover, MD

8

Viewing an HDF5 File with HDFView

Page 9: Introduction to HDF5

11/6/07 HDF and HDF-EOS Workshop XI, Landover, MD

9

Page 10: Introduction to HDF5

11/6/07 HDF and HDF-EOS Workshop XI, Landover, MD

10

Example HDF5 Application#include<stdio.h>#include "H5IM.h"

#define WIDTH 57 /* dataset dimensions */#define HEIGHT 57#define RANK 2

int main (void) { hid_t file; /* file handle */ herr_t status; unsigned char data[WIDTH][HEIGHT]; /* data to write */ int i, j, num, val; FILE *fp;

fp = fopen ("storm110.txt", "r"); /* Open ASCII file */

for (i=0; i<WIDTH; i++) /* Read Values into ‘data’ buffer */ for (j=0; j<HEIGHT; j++) { num = fscanf (fp, "%d ", &val); data[i][j] = val; } file = H5Fcreate ("storm.h5", H5F_ACC_TRUNC, H5P_DEFAULT, H5P_DEFAULT); /* Create file */ status = H5IMmake_image_8bit (file, "Storm_Image", WIDTH, HEIGHT, /* Create Image */

(const unsigned char *)data); status = H5Fclose (file); /* Close file */

}

Page 11: Introduction to HDF5

11/6/07 HDF and HDF-EOS Workshop XI, Landover, MD

11

HDF5 Data Model

Page 12: Introduction to HDF5

11/6/07 HDF and HDF-EOS Workshop XI, Landover, MD

12

HDF5 File

Container for Storing Scientific Data

• Primary Objects- Datasets- Groups

• Others Objects- Attributes- Property Lists- Dataspaces

Page 13: Introduction to HDF5

11/6/07 HDF and HDF-EOS Workshop XI, Landover, MD

13

HDF5 Dataset

• Data array- Ordered collection of identically typed data

items distinguished by their indices

• Metadata- Dataspace: Rank, dimensions; spatial info

about dataset- Datatype: Information to interpret your data- Storage Properties: How array is organized- Attributes: User-defined metadata (optional)

Page 14: Introduction to HDF5

11/6/07 HDF and HDF-EOS Workshop XI, Landover, MD

14

Dataset Components

3

RankRank

Dim_2 = 5

Dim_1 = 4

DimensionsDimensions

Time = 32.4

Pressure = 987

Temp = 56

AttributesAttributes

Chunked

Compressed

Dim_3 = 7

PropertiesProperties

IEEE 32-bit floatDatatypeDatatype

Metadata Data

DataspaceDataspace

Page 15: Introduction to HDF5

11/6/07 HDF and HDF-EOS Workshop XI, Landover, MD

15

HDF5 Dataset: Dataspace

• Rank and dimensions- Permanent part of dataset definition

• Subset of points, for partial I/O - Needed only during I/O operations

• Apply to datasets in memory or in the file

Rank = 2Rank = 2

Dimensions = 4x6Dimensions = 4x6

Spatial Information about a dataset

Page 16: Introduction to HDF5

11/6/07 HDF and HDF-EOS Workshop XI, Landover, MD

16

Each ElementEach Element

int8int8 int4int4 int16int16 2x3x2 array of float322x3x2 array of float32Datatype:Datatype:

HDF5 Dataset: Compound Datatype

Dimensionality: 5 x 3Dimensionality: 5 x 3

3

5

Page 17: Introduction to HDF5

11/6/07 HDF and HDF-EOS Workshop XI, Landover, MD

17

HDF5 Dataset: Datatype

Information on how to interpret a data element• Permanent part of the dataset definition• HDF5 atomic types

- normal integer & float- user-definable (e.g. 13-bit integer)- variable length types (e.g. strings)- pointers - references to objects/dataset regions- enumeration - names mapped to integers- array

• HDF5 compound types- Comparable to C structs - Members can be atomic or compound types

Page 18: Introduction to HDF5

11/6/07 HDF and HDF-EOS Workshop XI, Landover, MD

18

HDF5 Dataset: Property List

A collection of values that can be passed to HDF5

functions at lower layers of library

• There are property lists that you can use when:- creating a file- accessing a file- creating a dataset - reading/writing to a dataset.

• To use the HDF5 library defaults: H5Pdefault

Page 19: Introduction to HDF5

11/6/07 HDF and HDF-EOS Workshop XI, Landover, MD

19

HDF5 Dataset: Storage Layout Properties

• Contiguous: Dataset stored in continuous array of bytes (Default)

• Chunked: Dataset stored as fixed sized chunks. Each chunk is read/written with a single I/O operation.

Required for:- compression- unlimited dimension dataset

(extendible)

Page 20: Introduction to HDF5

11/6/07 HDF and HDF-EOS Workshop XI, Landover, MD

20

HDF5 Dataset: Properties

Better subsetting access time; extend, compression

Chunked

Improves storage efficiency, transmission speed

Compressed

Arrays can be extended in any direction

Extendible

Metadata for Fred

Dataset “Fred”

File AFile A

File BFile B

Data for FredData for Fred

Metadata in one file, raw data in another.External

file

Page 21: Introduction to HDF5

11/6/07 HDF and HDF-EOS Workshop XI, Landover, MD

21

HDF5 Dataset: Attributes

Data of form “name = value” attached to an object

• Scaled down versions of dataset operations - Not extendible - No compression - No partial I/O

• Optional

Page 22: Introduction to HDF5

11/6/07 HDF and HDF-EOS Workshop XI, Landover, MD

22

HDF5 Dataset (again)

• Data array- Ordered collection of identically typed data

items distinguished by their indices

• Metadata- Dataspace: Rank, dimensions; spatial info

about dataset- Datatype: Information to interpret your data- Properties: How array is organized- Attributes: User-defined metadata (optional)

Page 23: Introduction to HDF5

11/6/07 HDF and HDF-EOS Workshop XI, Landover, MD

23

HDF5 File: GroupsA mechanism for describing

collections of related objects• Every file starts with a

root group

• Can have attributes

• Similar to UNIXdirectories

“/”

Page 24: Introduction to HDF5

11/6/07 HDF and HDF-EOS Workshop XI, Landover, MD

24

“/”x

temp

/ (root)/x/foo/foo/temp/foo/bar/temp

HDF5 objects are identified and

located by their pathnames

foo

bartemp

Page 25: Introduction to HDF5

11/6/07 HDF and HDF-EOS Workshop XI, Landover, MD

25

HDF5 I/O Library

Page 26: Introduction to HDF5

11/6/07 HDF and HDF-EOS Workshop XI, Landover, MD

26

File or other “storage”

Virtual file I/O

Library internals

Structure of HDF5 Library

Object API (C, F90, C++, Java)

ApplicationsApplications

Page 27: Introduction to HDF5

11/6/07 HDF and HDF-EOS Workshop XI, Landover, MD

27

Virtual File I/O Layer

Allows HDF5 format address space to map to disk, the

network, memory, or a user-defined device

Network

NetworkFile Family MPI I/O Memory

Virtual file I/O driversVirtual file I/O drivers

Memory

Stdio

File File FamilyFamily

FileFile

““Storage”Storage”

Page 28: Introduction to HDF5

11/6/07 HDF and HDF-EOS Workshop XI, Landover, MD

28

Introduction to HDF5 API

Programming model for sequential access

Page 29: Introduction to HDF5

11/6/07 HDF and HDF-EOS Workshop XI, Landover, MD

29

General API Topics

• General info about HDF5 programming (C )

• Walk through example program

Page 30: Introduction to HDF5

11/6/07 HDF and HDF-EOS Workshop XI, Landover, MD

30

The General HDF5 API

• Currently has C, Fortran 90, Java, C++ bindings.

• C routines begin with prefix H5X, where X is a single letter indicating the object on which the operation is to be performed. Example APIs:

H5F: File Interface: H5FopenH5D: Dataset Interface: H5DreadH5S: DataSpace Interface:

H5Screate_simpleH5P: Property List Interface: H5Pset_chunkH5G: Group Interface: H5GcreateH5A: Attribute Interface:

H5Acreate

Page 31: Introduction to HDF5

11/6/07 HDF and HDF-EOS Workshop XI, Landover, MD

31

The General Paradigm

• Properties of objects are defined (optional)

• Objects are opened or created• Objects then accessed• Objects finally closed

Page 32: Introduction to HDF5

11/6/07 HDF and HDF-EOS Workshop XI, Landover, MD

32

Order of Operations

The library imposes an order on the operations by argument dependencies

Example: A file must be opened before a dataset because the dataset open call requires a file identifier as an argument

Page 33: Introduction to HDF5

11/6/07 HDF and HDF-EOS Workshop XI, Landover, MD

33

HDF5 C Programming Issues

For portability, HDF5 library has its own defined types.

For example: hid_t: Object identifiers hsize_t: Size used for dimensions herr_t: Function return value

For C, include #include hdf5.h at the top of

your HDF5 application.

Page 34: Introduction to HDF5

11/6/07 HDF and HDF-EOS Workshop XI, Landover, MD

34

H5dump Command-line Utility To View HDF5 File

h5dump [--header] [-a ] [-d <names>] [-g <names>] [-l <names>] [-t <names>] <file>

--header Display header only; no data is displayed. -a <names> Display the specified attribute(s). -d <names> Display the specified dataset(s). -g <names> Display the specified group(s) and all the members. -l <names> Displays the value(s) of the specified soft link(s). -t <names> Display the specified named datatype(s). -p Display properties. <names> is one or more appropriate object names.

Page 35: Introduction to HDF5

11/6/07 HDF and HDF-EOS Workshop XI, Landover, MD

35

HDF5 "dset.h5" {GROUP "/" { DATASET "dset" { DATATYPE { H5T_STD_I32BE } DATASPACE { SIMPLE ( 4, 6 ) / ( 4, 6 ) } DATA { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24 } }}}

“/”

‘dset’

Example of h5dump Output

Page 36: Introduction to HDF5

11/6/07 HDF and HDF-EOS Workshop XI, Landover, MD

36

Example HDF5 Application1 #include "hdf5.h"2 #define FILE "dset.h5"

3 int main () {

4 hid_t file_id, dataset_id, dataspace_id; 5 hsize_t dims[2];6 herr_t status;7 int i, j, dset_data[4][6];

8 for (i = 0; i < 4; i++)9 for (j = 0; j < 6; j++)10 dset_data[i][j] = i * 6 + j + 1;

11 file_id = H5Fcreate (FILE, H5F_ACC_TRUNC, H5P_DEFAULT, H5P_DEFAULT);

12 dims[0] = 4;13 dims[1] = 6;14 dataspace_id = H5Screate_simple (2, dims, NULL);

15 dataset_id = H5Dcreate (file_id, "/dset", H5T_STD_I32BE, dataspace_id, H5P_DEFAULT);16 status = H5Dwrite (dataset_id, H5T_NATIVE_INT, H5S_ALL, H5S_ALL, H5P_DEFAULT,

dset_data);

17 status = H5Sclose (dataspace_id);18 status = H5Dclose (dataset_id);19 status = H5Fclose (file_id);20 }

Steps:11 Create (or use default) file creation/access properties11 Create file w/ above properties 12-15 Create (or use default) dataset characteristics: [dataspace, datatype, storage properties]15 Create dataset using above characteristics16 Write data to dataset17-19 Close all interfaces

Page 37: Introduction to HDF5

11/6/07 HDF and HDF-EOS Workshop XI, Landover, MD

37

Example Code - Dataspace

12 dims[0] = 4;13 dims[1] = 6;14 dataspace_id = H5Screate_simple (2, dims, NULL);

15 dataset_id = H5Dcreate (file_id, “/dset", H5T_STD_I32BE, dataspace_id, H5P_DEFAULT);

RankArray of DimensionSizes (4x6)

NOT used here.

Page 38: Introduction to HDF5

11/6/07 HDF and HDF-EOS Workshop XI, Landover, MD

38

Example Code - Datatype

12 dims[0] = 4;13 dims[1] = 6;14 dataspace_id = H5Screate_simple (2, dims, NULL);

15 dataset_id = H5Dcreate (file_id, "/dset", H5T_STD_I32BE, dataspace_id, H5P_DEFAULT);

Where do youget the datatype?

Page 39: Introduction to HDF5

11/6/07 HDF and HDF-EOS Workshop XI, Landover, MD

39

HDF5 Pre-defined Datatype Identifiers

HDF5 opens set of Pre-Defined Datatype identifiers.

For example:

C Type HDF5 File Type HDF5 Memory Typeint H5T_STD_I32BE H5T_NATIVE_INT

H5T_STD_I32LE

float H5T_IEEE_F32BE H5T_NATIVE_FLOAT H5T_IEEE_F32LE

doubleH5T_IEEE_F64BE H5T_NATIVE_DOUBLEH5T_IEEE_F64LE

Page 40: Introduction to HDF5

11/6/07 HDF and HDF-EOS Workshop XI, Landover, MD

40

Pre-Defined File Datatype Identifiers

Examples:H5T_IEEE_F64LE Eight-byte, little-endian, IEEE floating-pointH5T_VAX_F32 Four-byte VAX floating pointH5T_STD_I32LE Four-byte, little-endian, signed two's

complement integerH5T_STD_U16BE Two-byte, big-endian, unsigned integer

NOTE: What you see in the file. Name is the same everywhere andexplicitly defines a datatype.

*STD= “An architecture with a semi-standard type like 2’s complement integer, unsigned integer…”

Architecture* ProgrammingType

Page 41: Introduction to HDF5

11/6/07 HDF and HDF-EOS Workshop XI, Landover, MD

41

Pre-defined Native Datatype Identifiers

Examples of predefined native types in C:

H5T_NATIVE_INT (int)H5T_NATIVE_FLOAT (float )H5T_NATIVE_UINT (unsigned int)H5T_NATIVE_LONG (long )H5T_NATIVE_CHAR (char )

NOTE: Memory types. Different for each machine.Used for reading/writing.

Page 42: Introduction to HDF5

11/6/07 HDF and HDF-EOS Workshop XI, Landover, MD

42

Example Code - H5Dwrite

status = H5Dwrite (dataset_id, H5T_NATIVE_INT, H5S_ALL, H5S_ALL, H5P_DEFAULT, dset_data);

Memory DatatypeDataset Identifier fromH5Dcreate or H5Dopen

Page 43: Introduction to HDF5

11/6/07 HDF and HDF-EOS Workshop XI, Landover, MD

43

Example Code – H5Dwrite

status = H5Dwrite (dataset_id, H5T_NATIVE_INT, H5S_ALL, H5S_ALL, H5P_DEFAULT, dset_data);

MemoryDataspace File

Dataspace

Data TransferProperty List

H5S_ALL selects entiredataspace

Page 44: Introduction to HDF5

11/6/07 HDF and HDF-EOS Workshop XI, Landover, MD

44

Memory and File Dataspaces – Why?

Partial I/O: Selected elements from source are mapped (read/written) to selected elements in destination- Selections in memory can differ from selection in file:

- Number of selected elements must be the same in source and destination

- Selection can be slabs, points, or result of set operations (union, difference ..) on slabs or points

Page 45: Introduction to HDF5

11/6/07 HDF and HDF-EOS Workshop XI, Landover, MD

45

Example Code To:

• Create dataset in a group other than root

• Open file and dataset and read data

• Create an attribute for the dataset

Page 46: Introduction to HDF5

11/6/07 HDF and HDF-EOS Workshop XI, Landover, MD

46

Example HDF5 Application1 #include "hdf5.h"2 #define FILE "dset.h5"

3 int main () {

4 hid_t file_id, dataset_id, dataspace_id; 5 hsize_t dims[2];6 herr_t status;7 int i, j, dset_data[4][6];

8 for (i = 0; i < 4; i++)9 for (j = 0; j < 6; j++)10 dset_data[i][j] = i * 6 + j + 1;

11 file_id = H5Fcreate (FILE, H5F_ACC_TRUNC, H5P_DEFAULT, H5P_DEFAULT);

12 dims[0] = 4;13 dims[1] = 6;14 dataspace_id = H5Screate_simple (2, dims, NULL);

15 dataset_id = H5Dcreate (file_id, "/dset", H5T_STD_I32BE, dataspace_id, H5P_DEFAULT);16 status = H5Dwrite (dataset_id, H5T_NATIVE_INT, H5S_ALL, H5S_ALL, H5P_DEFAULT,

dset_data);

17 status = H5Sclose (dataspace_id);18 status = H5Dclose (dataset_id);19 status = H5Fclose (file_id);20 }

Page 47: Introduction to HDF5

11/6/07 HDF and HDF-EOS Workshop XI, Landover, MD

47

How to put Dataset in a Group?

hid_t group_id; …

a group_id = H5Gcreate (file_id, "mygroup", 0);

b dataset_id = H5Dcreate (group_id, "dset", H5T_STD_I32BE, dataspace_id, H5P_DEFAULT);

c status = H5Gclose (group_id);

Steps:a. Create a groupb. Insert the dataset into the groupc. Close the group

Page 48: Introduction to HDF5

11/6/07 HDF and HDF-EOS Workshop XI, Landover, MD

48

h5dump Output w/Dataset in a Group

$ h5dump dset.h5HDF5 "dset.h5" {GROUP "/" { GROUP "mygroup" {

DATASET "dset" {DATATYPE H5T_STD_I32BEDATASPACE SIMPLE { ( 4, 6 ) / ( 4, 6 )

}DATA {(0,0): 1, 2, 3, 4, 5, 6,(1,0): 7, 8, 9, 10, 11, 12,(2,0): 13, 14, 15, 16, 17, 18,(3,0): 19, 20, 21, 22, 23, 24}

} }}}

Note that dataset is in the group“mygroup”

mygroup

dset

“/”

Page 49: Introduction to HDF5

11/6/07 HDF and HDF-EOS Workshop XI, Landover, MD

49

How to Read an Existing Dataset

file_id = H5Fopen (FILE, H5F_ACC_RDWR, H5P_DEFAULT);

dataset_id = H5Dopen (file_id, "/dset");

status = H5Dread (dataset_id, H5T_NATIVE_INT, H5S_ALL, H5S_ALL, H5P_DEFAULT, dset_rdata);

status = H5Dclose (dataset_id); status = H5Fclose (file_id);

Steps:- Open Existing File- Open Existing Dataset- Read Data- Close dataset, file ids

Page 50: Introduction to HDF5

11/6/07 HDF and HDF-EOS Workshop XI, Landover, MD

50

How to Create an Attribute in the Dataset?

hid_t aspace_id; hsize_t dimsa; int attr_data[2]= {100, 200}; hid_t attribute_id;…

dimsa = 2; aspace_id = H5Screate_simple(1, &dimsa, NULL);

a attribute_id = H5Acreate (dataset_id, "Units", H5T_STD_I32BE,

aspace_id, H5P_DEFAULT);b status = H5Awrite (attribute_id, H5T_NATIVE_INT, attr_data);

c status = H5Aclose (attribute_id);c status = H5Sclose (aspace_id);

Steps:a Create an attribute attached to already open datasetb Write data to the attributec Close attribute, dataspace

Attach to open dataset

Page 51: Introduction to HDF5

11/6/07 HDF and HDF-EOS Workshop XI, Landover, MD

51

h5dump Output of Dataset with an Attribute

HDF5 "dset.h5" {GROUP "/" { DATASET "dset" { DATATYPE H5T_STD_I32BE DATASPACE SIMPLE { ( 4, 6 ) / ( 4, 6 ) } DATA { (0,0): 1, 2, 3, 4, 5, 6, (1,0): 7, 8, 9, 10, 11, 12, (2,0): 13, 14, 15, 16, 17, 18, (3,0): 19, 20, 21, 22, 23, 24 } ATTRIBUTE "Units" { DATATYPE H5T_STD_I32BE DATASPACE SIMPLE { ( 2 ) / ( 2 ) } DATA { (0): 100, 200 } } }}}

Page 52: Introduction to HDF5

11/6/07 HDF and HDF-EOS Workshop XI, Landover, MD

52

HDF5 High Level APIs

Page 53: Introduction to HDF5

11/6/07 HDF and HDF-EOS Workshop XI, Landover, MD

53

High Level APIs

• Make HDF5 easier to use• Encourage standard ways to store

objects• Included with HDF5 library

However:• Still need HDF5 calls

(H5Fopen/H5Fclose…)

Page 54: Introduction to HDF5

11/6/07 HDF and HDF-EOS Workshop XI, Landover, MD

54

High Level APIs

• HDF5 Lite (H5LT): Functions that simplify the steps needed to create/read datasets and attributes

• HDF5 Image (H5IM): Functions for creating images in HDF5.

• HDF5 Table (H5TB): Functions for creating tables (collections of records) in HDF5.

• Others … (Dimension Scales, Packet Table)

Page 55: Introduction to HDF5

11/6/07 HDF and HDF-EOS Workshop XI, Landover, MD

55

file_id = H5Fcreate ( "test.h5", ……. );

/* Call some High Level function */H5LTsome_function ( file_id, ...extra

parameters );

status = H5Fclose ( file_id );

High Level Programming Model

Page 56: Introduction to HDF5

11/6/07 HDF and HDF-EOS Workshop XI, Landover, MD

56

HDF5 Hands-On

• Need Secure Shell or putty

• Everyone is logged in as: workshop

• Working directory for each user is specified at top right of Hands-On page

• Compile Examples with: h5fc, h5cc

• Display Dataset with: h5dump

Page 57: Introduction to HDF5

11/6/07 HDF and HDF-EOS Workshop XI, Landover, MD

57

Thank you!This presentation is based upon work supported in part by a Cooperative Agreement with NASA under NASA NNX06AC83A. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Aeronautics and SpaceAdministration.