1 Overview of HDF5 HDF Summit Boeing Seattle The HDF Group (THG) September 19, 2006.

38
1 Overview of HDF5 HDF Summit Boeing Seattle The HDF Group (THG) September 19, 2006

Transcript of 1 Overview of HDF5 HDF Summit Boeing Seattle The HDF Group (THG) September 19, 2006.

Page 1: 1 Overview of HDF5 HDF Summit Boeing Seattle The HDF Group (THG) September 19, 2006.

1

Overview of HDF5 HDF Summit

Boeing SeattleThe HDF Group (THG)September 19, 2006

Page 2: 1 Overview of HDF5 HDF Summit Boeing Seattle The HDF Group (THG) September 19, 2006.

2

Topics• What is HDF?• Sample uses of HDF• THG the Company

Page 3: 1 Overview of HDF5 HDF Summit Boeing Seattle The HDF Group (THG) September 19, 2006.

3

What is HDF?

Page 4: 1 Overview of HDF5 HDF Summit Boeing Seattle The HDF Group (THG) September 19, 2006.

4

Answering big questions …

Matter & the universeMatter & the universe

August 24, 2001August 24, 2001August 24, 2001August 24, 2001 August 24, 2002August 24, 2002August 24, 2002August 24, 2002

Total Column Ozone (Dobson)Total Column Ozone (Dobson)Total Column Ozone (Dobson)Total Column Ozone (Dobson)

60 385 61060 385 61060 385 61060 385 610

Weather and climateWeather and climate

Life and natureLife and nature

Page 5: 1 Overview of HDF5 HDF Summit Boeing Seattle The HDF Group (THG) September 19, 2006.

5

involves big data …

Page 6: 1 Overview of HDF5 HDF Summit Boeing Seattle The HDF Group (THG) September 19, 2006.

6

varied data…

caacaagccaaaactcgtacaaatatgaccgcacttcgctataaagaacacggcttgtggcgagatatctcttggaaaaactttcaagagcaactcaatcaactttctcgagcattgcttgctcacaatattgacgtacaagataaaatcgccatttttgcccataatatggaacgttgggttgttcatgaaactttcggtatcaaagatggtttaatgaccactgttcacgcaacgactacaatcgttgacattgcgaccttacaaattcgagcaatcacagtgcctatttacgcaaccaatacagcccagcaagcagaatttatcctaaatcacgccgatgtaaaaattctcttcgtcggcgatcaagagcaatacgatcaaacattggaaattgctcatcattgtccaaaattacaaaaaattgtagcaatgaaatccaccattcaattacaacaagatcctctttcttgcacttgg

Page 7: 1 Overview of HDF5 HDF Summit Boeing Seattle The HDF Group (THG) September 19, 2006.

7

Contig Summaries

Discrepancies

Contig Qualities

Coverage Depth

and complex relationships…

Read Read qualityquality

Aligned bases

ContigContig

Reads

Percent match

SNP ScoreSNP Score

TraceTrace

Page 8: 1 Overview of HDF5 HDF Summit Boeing Seattle The HDF Group (THG) September 19, 2006.

8

on big computers…

Page 9: 1 Overview of HDF5 HDF Summit Boeing Seattle The HDF Group (THG) September 19, 2006.

9

and on little computers.

Page 10: 1 Overview of HDF5 HDF Summit Boeing Seattle The HDF Group (THG) September 19, 2006.

10

How do we…• Describe the data? • Read it? Store it? Find it? Share it? Mine it?

• Move it into, out of, and between computers

and repositories

Page 11: 1 Overview of HDF5 HDF Summit Boeing Seattle The HDF Group (THG) September 19, 2006.

11

HDF is• A file format for managing any kind of

data• Software to store and access data in

the format• Suited especially to large or complex data

collections• Suited for every size of system• Platform independent – runs almost

anywhere• Open – both file formats and software

Page 12: 1 Overview of HDF5 HDF Summit Boeing Seattle The HDF Group (THG) September 19, 2006.

12

HDF solution

I/O software & tools

CommonCommonData Data

modelsmodels

StandardAPIs

Scientific data file format

Efficient storage, I/O

Page 13: 1 Overview of HDF5 HDF Summit Boeing Seattle The HDF Group (THG) September 19, 2006.

13

An HDF file is a container…

lat | lon | temp----|-----|----- 12 | 23 | 3.1 15 | 24 | 4.2 17 | 21 | 3.6

palette

palette

……into into which you which you can put can put your data your data objects.objects.

Page 14: 1 Overview of HDF5 HDF Summit Boeing Seattle The HDF Group (THG) September 19, 2006.

14

HDF structures for organizing objects in files

palettepalette

Raster imageRaster image

3-D array3-D array

2-D array2-D arrayRaster imageRaster image

lat | lon | templat | lon | temp----|-----|---------|-----|----- 12 | 23 | 3.112 | 23 | 3.1 15 | 24 | 4.215 | 24 | 4.2 17 | 21 | 3.617 | 21 | 3.6

TableTable

““/” /” (root)(root)““/” /” (root)(root)

““/foo”/foo”““/foo”/foo”

Page 15: 1 Overview of HDF5 HDF Summit Boeing Seattle The HDF Group (THG) September 19, 2006.

16

Mesh Example, in HDFView

Page 16: 1 Overview of HDF5 HDF Summit Boeing Seattle The HDF Group (THG) September 19, 2006.

17

HDF5 Software

Tools & ApplicationsTools & ApplicationsTools & ApplicationsTools & Applications

HDF FileHDF FileHDF FileHDF File

HDF I/O LibraryHDF I/O LibraryHDF I/O LibraryHDF I/O Library

Page 17: 1 Overview of HDF5 HDF Summit Boeing Seattle The HDF Group (THG) September 19, 2006.

18

Goals of HDF5 Library• Flexible API to support a wide range of

operations on data• High performance access in serial and

parallel computing environments• Compatibility with common data models

and programming languages

Page 18: 1 Overview of HDF5 HDF Summit Boeing Seattle The HDF Group (THG) September 19, 2006.

19

Features• Ability to create complex data structures• Complex subsetting• Efficient storage• Flexible I/O (parallel, remote, etc.)• Ability to transform data during I/O• Support for key language models

• OO compatible• C & Fortran primarily• Also Java, C++

Page 19: 1 Overview of HDF5 HDF Summit Boeing Seattle The HDF Group (THG) September 19, 2006.

20

Sample uses of HDF

Page 20: 1 Overview of HDF5 HDF Summit Boeing Seattle The HDF Group (THG) September 19, 2006.

21

1. NASA Earth Observing System (EOS)

Aqua (6/01)Aura

TES HRDLSMLS OMI

Terra

CERES MISR

MODIS MOPITT

AquaCERES MODIS

AMSR

Page 21: 1 Overview of HDF5 HDF Summit Boeing Seattle The HDF Group (THG) September 19, 2006.

22

2. Advanced Simulation & Computing (ASC)

Question: How do we maintain a nuclear stockpile in the absence

of testing?

Page 22: 1 Overview of HDF5 HDF Summit Boeing Seattle The HDF Group (THG) September 19, 2006.

23

Answer: Very large simulations

on very large computers

Page 23: 1 Overview of HDF5 HDF Summit Boeing Seattle The HDF Group (THG) September 19, 2006.

24

ASC Data requirements• Large datasets (> a terabyte) • Good I/O performance on massive

parallel systems Complex data and extensive metadata

Page 24: 1 Overview of HDF5 HDF Summit Boeing Seattle The HDF Group (THG) September 19, 2006.

25

Page 25: 1 Overview of HDF5 HDF Summit Boeing Seattle The HDF Group (THG) September 19, 2006.

26

3. Bioinformatics

--

Managing genomic data

caacaagccaaaactcgtacaacaacaagccaaaactcgtacaaCgagatatctcttggaaaaactCgagatatctcttggaaaaactgctcacaatattgacgtacaaggctcacaatattgacgtacaaggttgttcatgaaactttcggtagttgttcatgaaactttcggtaAcaatcgttgacattgcgacctAcaatcgttgacattgcgacctaatacagcccagcaagcagaataatacagcccagcaagcagaat

Page 26: 1 Overview of HDF5 HDF Summit Boeing Seattle The HDF Group (THG) September 19, 2006.

27

DNA sequencing workflows• Diverse formats• Highly redundant data• Repeated file

processing• Disconnected

programs• Non-scalable storage• Lack of persistence

Page 27: 1 Overview of HDF5 HDF Summit Boeing Seattle The HDF Group (THG) September 19, 2006.

28

Multiple levels and relationships

Contig Summaries

Discrepancies

Contig Qualities

Coverage Depth

Read Read qualityquality

Aligned bases

ContigContig

Reads

Percent match

SNP ScoreSNP Score

TraceTrace

Page 28: 1 Overview of HDF5 HDF Summit Boeing Seattle The HDF Group (THG) September 19, 2006.

29

HDF5 as binary format for bioinformatics

Page 29: 1 Overview of HDF5 HDF Summit Boeing Seattle The HDF Group (THG) September 19, 2006.

30

4. Flight test data--

Page 30: 1 Overview of HDF5 HDF Summit Boeing Seattle The HDF Group (THG) September 19, 2006.

31

3. Boeing flight test

Page 31: 1 Overview of HDF5 HDF Summit Boeing Seattle The HDF Group (THG) September 19, 2006.

32

Flight test data requirements• Fast data acquisition from 1000s of

sources• Wide variety of data types• Active archive • Standardization for data/software

exchange• Special features

Page 32: 1 Overview of HDF5 HDF Summit Boeing Seattle The HDF Group (THG) September 19, 2006.

35

THG the Company

Page 33: 1 Overview of HDF5 HDF Summit Boeing Seattle The HDF Group (THG) September 19, 2006.

36

What is the HDF Group?• 18 years at National Center for

Supercomputing Center (NCSA) at University of Illinois

• Recent spin-off U of I• Non-profit 501(c)(3)• 17 scientific, technology, and professional

staff• 5 students• 2+million product users world-wide• Cross industry sectors and disciplines

Page 34: 1 Overview of HDF5 HDF Summit Boeing Seattle The HDF Group (THG) September 19, 2006.

37

THG missionTo support the vast

community of HDF users and to ensure the sustainable

development of HDF technologies and the

ongoing accessibility of HDF-stored data.

Page 35: 1 Overview of HDF5 HDF Summit Boeing Seattle The HDF Group (THG) September 19, 2006.

38

Business model• Non-profit: mission driven• Intellectual property:

• U of I plans to assign ownership to THG• The HDF formats will remain free, and

HDF software will remain open source.

• Continue close ties to U of I and NCSA.

Page 36: 1 Overview of HDF5 HDF Summit Boeing Seattle The HDF Group (THG) September 19, 2006.

39

Income-generating activities• Major client support• Targeted HDF development• Grant-supported R&D• Consulting

Page 37: 1 Overview of HDF5 HDF Summit Boeing Seattle The HDF Group (THG) September 19, 2006.

40

Thank you

Page 38: 1 Overview of HDF5 HDF Summit Boeing Seattle The HDF Group (THG) September 19, 2006.

41

HDF Information• HDF Information Center

• http://hdfgroup.org/

• HDF Help email address• [email protected]/

• HDF users mailing list• [email protected]/