([email protected]) Parallel netCDF

24
Parallel netCDF: A High-Performance Scientific I/O Interface W. Liao, A. Choudhary (Northwestern University) R. Ross, R. Thakur, W. Gropp, R. Latham, A. Siegel (Argonne National Laboratory) B. Gallagher (University of Chicago) M. Zingale (University of California, Santa Cruz) November 20, 2003 Jianwei Li ECE Department, Northwestern University ([email protected])

Transcript of ([email protected]) Parallel netCDF

Page 1: (jianwei@ece.northwestern.edu) Parallel netCDF

Parallel netCDF:A High-Performance Scientific I/O Interface

W. Liao, A. Choudhary (Northwestern University)R. Ross, R. Thakur, W. Gropp, R. Latham, A. Siegel (Argonne National Laboratory)B. Gallagher (University of Chicago)M. Zingale (University of California, Santa Cruz)

November 20, 2003

Jianwei LiECE Department, Northwestern University

([email protected])

Page 2: (jianwei@ece.northwestern.edu) Parallel netCDF

11/24/2003 SuperComputing 2003 2

Outline

• Overview of netCDF & our motivation• NetCDF file format & current serial interface• Access netCDF files in parallel applications• Parallel netCDF design & implementation• Parallel netCDF vs Parallel HDF5• Performance evaluation• Summary & future work• Software and documentation resources

Page 3: (jianwei@ece.northwestern.edu) Parallel netCDF

11/24/2003 SuperComputing 2003 3

Introduction• netCDF (network Common Data Form)

– Initially by Unidata Program Center in Boulder, Colorado.– Provide a common data access method for various scientific apps.

• File format & programming interface– Portable, platform independent and self-describing file format– Libraries for array-oriented data access in C, FORTRAN, etc.

• A broad community of netCDF users– Easy to learn and use– A de facto data access standard for much of the climate community– atmospheric app., ocean modeling, fusion simulations, medical

images, satellite/radar images– Huge amount of existing netCDF datasets, and processing tools

• Well supported by Unidata• Parallel interface support from NWU/ANL

Page 4: (jianwei@ece.northwestern.edu) Parallel netCDF

11/24/2003 SuperComputing 2003 4

Why Parallel I/O?

• Most scientific applications are in parallel• Increasing requirement on data amount• I/O bottleneck• Scalability needs parallel I/O• Programming convenience with parallel I/O• Significant I/O performance improvement

with appropriate parallel I/O techniques• Well studied and supported

Page 5: (jianwei@ece.northwestern.edu) Parallel netCDF

11/24/2003 SuperComputing 2003 5

NetCDF File Format

netCDF Header

2 non-record variable

n non-record variable

1 non-record variablest

nd

1 record for 1 record variablest

1 record for 2 record variable

1 record for r record variablest

st

2 records for 1 , 2 , ... ,th

th

dimension for 1 , 2 , ... , r variables

r record variables in ordernd

stnd

Interleaved records grow in the UNLIMITED

fixed

-siz

e ar

rays

varia

ble-

size

arr

ays

thndst

th

st nd

n fixed-size variables:arrays of fixed dimensions

r record variables: arrays with its most-significant dimension UNLIMITED,records defined by the rest of the dimensions.

header:definition of the netCDF dataset

Page 6: (jianwei@ece.northwestern.edu) Parallel netCDF

11/24/2003 SuperComputing 2003 6

File Header

Dimensions

Attributes

Variables

Defines the 3 components of a netCDF dataset:

Page 7: (jianwei@ece.northwestern.edu) Parallel netCDF

11/24/2003 SuperComputing 2003 7

An Example NetCDF FilenetCDFnetCDFnetCDFnetCDF_example_example_example_example { // CDL notation for netCDF dataset

dimensionsdimensionsdimensionsdimensions: // dimension (names and lengths)lat = 5, lon = 10, level = 4, time = unlimited;

variablesvariablesvariablesvariables: // var (types, names, shapes, attributes) float temp(time,level,lat,lon);

temp:long_name = "temperature"; temp:units = "celsius";

float rh(time,lat,lon); rh:long_name = "relative humidity"; rh:valid_range = 0.0, 1.0; // min and max

// global attributesglobal attributesglobal attributesglobal attributes: :source = "Fictional Model Output";

datadatadatadata: // optional data assignments temp = 0, …, 0;rh = .5,.2,.4,.2,.3,.2,.4,.5,.6,.7,

.1,.3,.1,.1,.1,.1,.5,.7,.8,.8,

.1,.2,.2,.2,.2,.5,.7,.8,.9,.9,

.1,.2,.3,.3,.3,.3,.7,.8,.9,.9,0,.1,.2,.4,.4,.4,.4,.7,.9,.9; // 1 record allocated

}

Page 8: (jianwei@ece.northwestern.edu) Parallel netCDF

11/24/2003 SuperComputing 2003 8

Serial NetCDF API• Dataset Functions

– create/open/close/abort a dataset– set the dataset to define/data mode– synchronize dataset changes to storage

• Define Mode Functions– define dataset dimensions and variables

• Attribute Functions– manage adding, changing, and reading attributes of the dataset

• Inquiry Functions– return dataset metadata:

dim(id, name, len), var(id, name, ndims, shape, etype), # of dims/vars/attributes, etc.

• Data Access Functions– read/write variable data in one of the five access methods (patterns):

single element, whole array, subarray, strided subarray, or mapped strided subarray

Built on native I/O and designed for serial access

Page 9: (jianwei@ece.northwestern.edu) Parallel netCDF

11/24/2003 SuperComputing 2003 9

Parallel Access to NetCDF Files?

• Slow and cumbersome

• Data shipping

• I/O bottleneck

• Memory requirement

netCDF

Parallel File System

P2 P3P0 P1

netCDF netCDF netCDF netCDF

P2 P3P0 P1

Parallel File System

• Multiple file pieces

• Breaks the netCDF dataset

• Independent Access

Page 10: (jianwei@ece.northwestern.edu) Parallel netCDF

11/24/2003 SuperComputing 2003 10

Access through Parallel NetCDF!

Parallel File System

Parallel netCDF

P2 P3P0 P1

• New parallel interface

• Perform I/O cooperatively or collectively

• Potential parallel I/O optimizations for better performance

• NetCDF data integration and management

Page 11: (jianwei@ece.northwestern.edu) Parallel netCDF

11/24/2003 SuperComputing 2003 11

Architecture for Parallel NetCDF

Communication Network

ComputeNode

ComputeNode

ComputeNode

ComputeNode

I/OServer

I/OServer

I/OServer

File SystemSpace

User SpaceParallel netCDF

MPI-IO

Page 12: (jianwei@ece.northwestern.edu) Parallel netCDF

11/24/2003 SuperComputing 2003 12

Parallel NetCDF API• The same netCDF file format• Maintains the look and feel of the serial netCDF API

– Same syntax & semantics for:• Dataset functions (except create/open)• Define mode functions, attribute functions and inquiry functions• High-level data access functions

– Distinguish by prefixing function calls with “ncmpi_”/“nfmpi_”• Parallel access through MPI-IO

– Benefit from MPI-IO optimizations (data shipping, two-phase I/O, etc.)– MPI communicator added in the argument list for create/open– MPI_Info used for parallel I/O management and further optimization– Collective I/O (function names end with “_all”) vs noncollective I/O

• New flexible data access functions– Accept non-contiguous memory regions described by MPI datatypes

Page 13: (jianwei@ece.northwestern.edu) Parallel netCDF

11/24/2003 SuperComputing 2003 13

PnetCDF Example Code

1. ncmpi_create(mpi_comm, filename, open_mode, mpi_info, &file_id);2. ncmpi_def_var(file_id, ...);

. . . . . .ncmpi_enddef(file_id);

3. ncmpi_put_vara_all(file_id, var_id, start[], count[], buffer, bufcount, mpi_datatype);4. ncmpi_close(file_id);

1. ncmpi_open(mpi_comm, filename, open_mode, mpi_info, &file_id);2. ncmpi_inq(file_id, ... );

. . . . . .3. ncmpi_get_vars(file_id, var_id, start[], count[], stride[], buffer, bufcount, mpi_datatype);4. ncmpi_close(file_id);

File write

File read

Page 14: (jianwei@ece.northwestern.edu) Parallel netCDF

11/24/2003 SuperComputing 2003 14

Access to File HeaderInput File Header

Output File Header

P0Input Header

Header Structure

Output Header

P1Input Header

Header Structure

Output Header

P2Input Header

Header Structure

Output Header

P3Input Header

Header Structure

Output Header

comm, hints

Page 15: (jianwei@ece.northwestern.edu) Parallel netCDF

11/24/2003 SuperComputing 2003 15

Parallel I/O for Array Data

P0Header

Structure

MPI-IO

Input/Output B

uffer

memorydatatype

start[ ]count[ ]stride[ ]imap[ ]

Data

variablemetadata

commhints

MPIfileview

array data in netCDF file

P1Header

Structure

MPI-IOInput/O

utput Buffer

memorydatatype

start[ ]count[ ]stride[ ]imap[ ]

Data

variablemetadata

commhints

MPIfileview

P2Header

Structure

MPI-IO

Input/Output B

uffer

memorydatatype

start[ ]count[ ]stride[ ]imap[ ]

Data

variablemetadata

commhints

MPIfileview

P3Header

Structure

MPI-IO

Input/Output B

uffer

memorydatatype

start[ ]count[ ]stride[ ]imap[ ]

Data

variablemetadata

commhints

MPIfileview

Page 16: (jianwei@ece.northwestern.edu) Parallel netCDF

11/24/2003 SuperComputing 2003 16

Compare with Parallel HDF5• Both define portable, self-describing file formats• Both serve as data access standards for parallel scientific applications • Both have parallel I/O built on top of MPI-IO

• Contiguous or interleaved data layout;• Sits on top of MPI-IO, transferring user

access patterns directly to MPI fileview, little overhead.

• One time header I/O gets all necessary info for direct access of each data array;

• By maintaining a local copy of header, each process can access any array identified by its permanent ID at any time, without any collective open/close operation of the object. Saves a lot of inter-process communication.

• Tree-like file structure;• Use dataspace and hyperslabs to define data

organization, map and transfer data between memory and filespace, pack or unpack.

• Iterate through entire namespace to get the header info to access each object;

• Need collective open/close operation when accessing each single object, and for file write, the metadata need to be updated in a synchronous way. So lots of communication overhead.

Parallel netCDF Parallel HDF5

Hierarchical Data Format 5By NCSA

Page 17: (jianwei@ece.northwestern.edu) Parallel netCDF

11/24/2003 SuperComputing 2003 17

Performance Evaluation

• A micro-benchmark accessing a 3-D array– Platform: Bluehorizon (IBM SP-2) @ SDSC

• IBM’s MPI, GPFS file system– Performance scalability of our PnetCDF– Compare Parallel netCDF with Serial netCDF

• FLASH I/O benchmark writing a number of multi-dimensional arrays– Platform: ASCI White (Frost) @ LLNL

• IBM’s MPI, GPFS file system– Compare PnetCDF with PHDF5

Page 18: (jianwei@ece.northwestern.edu) Parallel netCDF

11/24/2003 SuperComputing 2003 18

Data Partition of Micro-benchmark

Y

Z

Y PartitionZ Partition X Partition

YX PartitionZY Partition ZX Partition ZYX Partition

Processor 0Processor 1

Processor 2Processor 3

Processor 4Processor 5

Processor 6Processor 7

X

Page 19: (jianwei@ece.northwestern.edu) Parallel netCDF

11/24/2003 SuperComputing 2003 19

Performance Scalability

Parallel netCDF

Parallel netCDF Parallel netCDF

Parallel netCDF

XZY ZY

ZXYX

ZYX

ZYXXZY ZY

ZXYX

ZYXXZY ZY

ZXYX

SerialnetCDF

SerialnetCDF

SerialnetCDF

ZYXXZY ZY

ZXYX

Ban

dwid

th (

MB

/sec

)

Ban

dwid

th (

MB

/sec

)

Ban

dwid

th (

MB

/sec

)

Ban

dwid

th (

MB

/sec

)

Number of processors Number of processors

Number of processorsNumber of processors

Read 64 MB Write 64 MB

Write 1 GBRead 1 GB

SerialnetCDF

1 2 4 8 160100200300400500600700

2 4 8 1610

50

100

150

200

1 2 4 8 16 32 0

50

100

150

200

250

0

50

100

150

200

250

300

2 4 8 16 32111

1 1

Page 20: (jianwei@ece.northwestern.edu) Parallel netCDF

11/24/2003 SuperComputing 2003 20

FLASH I/O Benchmark• FLASH – parallel hydrodynamics code

– Simulate astrophysical thermonuclear flashes in 2/3D– Use structured adaptive mesh refinement method to

solve some physics equations– Produce checkpoint files and visualization plotfiles

• FLASH I/O benchmark– Simulate the I/O pattern of FLASH– 3D AMR subblocks with a perimeter of four guard cells

(In the simulation, 80 blocks by each process, 8*8*8 or 16*16*16)– Generate a checkpoint file, a plotfile with centered data,

and a plotfile with corner data– A series of multi-dimensional arrays, each partitioned

in the most significant dimension

Page 21: (jianwei@ece.northwestern.edu) Parallel netCDF

11/24/2003 SuperComputing 2003 21

FLASH I/O Performance

Page 22: (jianwei@ece.northwestern.edu) Parallel netCDF

11/24/2003 SuperComputing 2003 22

FLASH I/O Performance (cont.)

Page 23: (jianwei@ece.northwestern.edu) Parallel netCDF

11/24/2003 SuperComputing 2003 23

Summary and Future Work• PnetCDF - A new high-level I/O library

– Uses same data format as netCDF– Maintains the flavor of the netCDF API– Provides parallel access semantics

• Current status– Distributions of code available, w/ basic test suite– Mailing list for users, support– External groups are already adopting this software– Performance is already competitive w/out any tuning

• Future work– Perfect the library, improving our test suite, support– Performance tuning for key platforms (IBM SP, Linux)

Page 24: (jianwei@ece.northwestern.edu) Parallel netCDF

11/24/2003 SuperComputing 2003 24

More Information

• Learn more about netCDFhttp://www.unidata.ucar.edu/packages/netcdf/

• Download the PnetCDF software & docshttp://www.mcs.anl.gov/parallel-netcdf/

>> Questions???

Thank you!Thank you!