Parallel netCDF Study.

Post on 23-Jan-2016

57 views 0 download

Tags:

description

Parallel netCDF Study. John Tannahill Lawrence Livermore National Laboratory (tannahill1@llnl.gov) September 30, 2003. Acknowledgments (1). - PowerPoint PPT Presentation

Transcript of Parallel netCDF Study.

Parallel netCDF Study.Parallel netCDF Study.

John TannahillJohn Tannahill

Lawrence Livermore National LaboratoryLawrence Livermore National Laboratory

(tannahill1@llnl.gov)(tannahill1@llnl.gov)

September 30, 2003September 30, 2003

Slide 2, 04/21/23Computing Applications and Research DepartmentComputing Applications and Research Department

Acknowledgments (1).

This work was performed under the auspices of the U.S.

Department of Energy by the University of California, Lawrence

Livermore National Laboratory under contract No. W-7405-Eng-48.

Work funded by the LLNL/CAR Techbase Program.

Many thanks to this program for providing the resources to conduct

this study of parallel netCDF, something that probably would not

have occurred otherwise.

This is LLNL Report: UCRL-PRES-200247

Slide 3, 04/21/23Computing Applications and Research DepartmentComputing Applications and Research Department

Acknowledgments (2).

Additional thanks to all the people who contributed to this study in one way or another:

Argonne National Laboratory (ANL): William Gropp, Robert Latham, Rob Ross, & Rajeev Thakur.

Northwestern (NW) University: Alok Choudhary, Jianwei Li, & Wei-keng Liao.

Lawrence Livermore National Laboratory (LLNL): Richard Hedges, Bill Loewe, & Tyce McLarty.

Lawrence Berkeley Laboratory (LBL) / NERSC: Chris Ding & Woo-Sun Yang.

UCAR / NCAR / Unidata: Russ Rew.

University of Chicago: Brad Gallagher.

Slide 4, 04/21/23Computing Applications and Research DepartmentComputing Applications and Research Department

Overview of contents.

Proposal background and goals.

Parallel I/O options initially explored.

A/NW’s parallel netCDF library.

(A/NW = Argonne National Laboratory / Northwestern University)

Installation.

Fortran interface.

Serial vs. Parallel netCDF performance.

Test code details.

Timing results.

Parallel HDF5 comparison.

Observations / Conclusions.

Remaining questions / issues.

Slide 5, 04/21/23Computing Applications and Research DepartmentComputing Applications and Research Department

Why parallel netCDF (1)?

Parallel codes need parallel I/O.

Performance.

Ease of programming and understandability of code.

Serial netCDF is in widespread use.

Currently a de-facto standard for much of the climate community.

Easy to learn and use.

Well supported by Unidata.

Huge amount of existing netCDF data sets.

Many netCDF post-processing codes and tools.

Slide 6, 04/21/23Computing Applications and Research DepartmentComputing Applications and Research Department

Why parallel netCDF (2)?

Hopefully a fairly straightforward process to migrate from serial to

parallel netCDF.

From material presented at a SuperComputing 2002 tutorial

(11/02), it appeared that at least one feasible option for a Fortran

parallel netCDF capability would soon be available.

Slide 7, 04/21/23Computing Applications and Research DepartmentComputing Applications and Research Department

Summary of work performed under proposal (1).

Read material, performed web searches, and communicated with a

number of people to determine what options were available.

Parallel I/O for High Performance Computing by John May.

Once the decision was made to go with A/NW’s parallel netCDF,

collaborated with them extensively:

First, to get the kinks out of the installation procedure, for each of

the platforms of interest.

Next, to get the Fortran interface working properly.

– C interface complete, but Fortran interface needed considerable

work.

– Wrote Fortran 90 (F90) and C interface test codes.

Slide 8, 04/21/23Computing Applications and Research DepartmentComputing Applications and Research Department

Summary of work performed under proposal (2).

Also developed F90 test codes for performance testing:

One that emulates the way serial netCDF is currently being used

to do I/O in our primary model.

Another that replaces the serial netCDF code with its A/NW

parallel netCDF equivalent.

Ran a large number of serial / parallel netCDF timings.

Collaborated with Livermore Computing personnel to convert

the parallel netCDF test code to its parallel HDF5 equivalent.

Ran a limited number of parallel HDF5 timings for comparison

with parallel netCDF.

Created this presentation / report.

Slide 9, 04/21/23Computing Applications and Research DepartmentComputing Applications and Research Department

Ultimate goals.

Bring a much-needed viable Fortran parallel netCDF capability to

the Lab.

Incorporate parallel netCDF capabilities into our primary model, an

Atmospheric Chemical Transport Model (ACTM) called “Impact”.

Model uses a logically rectangular, 2D lon/lat domain

decomposition, with a processor assigned to each subdomain.

Each subdomain consists of a collection of full vertical columns,

spread over a limited range of latitude and longitude.

Employs a Master / Slaves paradigm.

MPI used to communicate between processors as necessary.

Slide 10, 04/21/23Computing Applications and Research DepartmentComputing Applications and Research Department

2-D (lon/lat) domain decomposition.

Slide 11, 04/21/23Computing Applications and Research DepartmentComputing Applications and Research Department

Impact model / Serial netCDF.

Impact currently uses serial netCDF for much of its I/O.

Slaves read their own data.

Writes are done by the Master only.

Data communicated back to Master for output.

Buffering required because of large arrays and limited memory on

Master.

Complicates the code considerably.

Increased I/O performance welcomed, but code not necessarily I/O

bound.

Slide 12, 04/21/23Computing Applications and Research DepartmentComputing Applications and Research Department

Impact model / Serial netCDF calls.

Impact Serial netCDF Calls

Nf_Create

Nf_Open

Nf_Close

Nf_Def_Dim

Nf_Def_Var

Nf_Enddef

Nf_Get_Var_Int Nf_Put_Var_Double

Nf_Put_Var_Int

Nf_Put_Var_Real

Nf_Set_Fill

Nf_Sync

Nf_Inq_Dimid

Nf_Inq_Dimlen

Nf_Inq_Unlimdim

Nf_Inq_Varid

Nf_Get_Vara_Double

Nf_Get_Vara_Int

Nf_Get_Vara_Real

Nf_Put_Vara_Double

Nf_Put_Vara_Int

Nf_Put_Vara_Real

Nf_Put_Vara_Text

Nf_Get_Att_Text Nf_Put_Att_Text

Slide 13, 04/21/23Computing Applications and Research DepartmentComputing Applications and Research Department

Parallel I/O options initially explored (1).

Parallel netCDF alternatives:

A/NW (much more later).

LBL / NERSC:

– Ziolib + parallel netCDF.

– Level of support? Small user base? Recoding effort?

– My lack of understanding in general?

Unidata / NCSA project:

– “Merging the NetCDF and HDF5 Libraries to Achieve Gains in

Performance and Interoperability.”

– PI is Russ Rew, one of the primary developers of serial netCDF.

– Multi-year project and just began, so not a viable option.

Slide 14, 04/21/23Computing Applications and Research DepartmentComputing Applications and Research Department

Abstract of Unidata / NCSA project.

Merging the NetCDF and HDF5 Libraries to Achieve Gains in Performance and Interoperability.

The proposed work will merge Unidata's netCDF and NCSA's HDF5, two widely-used scientific data access libraries. Users of netCDF in numerical models will benefit from support for packed data, large datasets, and parallel I/O, all of which are available with HDF5. Users of HDF5 will benefit from the availability of a simpler high-level interface suitable for array-oriented scientific data, wider use of the HDF5 data format, and the wealth of netCDF software for data management, analysis and visualization that has evolved among the large netCDF user community. The overall goal of this collaborative development project is to create and deploy software that will preserve the desirable common characteristics of netCDF and HDF5 while taking advantage of their separate strengths: the widespread use and simplicity of netCDF and the generality and performance of HDF5.

To achieve this goal, Unidata and NCSA will collaborate to create netCDF-4, using HDF5 as its storage layer. Using netCDF-4 in advanced Earth science modeling efforts will demonstrate its effectiveness. The success of this project will facilitate open and free technologies that support scientific data storage, exchange, access, analysis, discovery, and visualization. The technology resulting from the netCDF-4/HDF5 merger will benefit users of Earth science data and promote cross-disciplinary research through the provision of better facilities for combining, synthesizing, aggregating, and analyzing datasets from disparate sources to make them more accessible.

Slide 15, 04/21/23Computing Applications and Research DepartmentComputing Applications and Research Department

Parallel I/O options initially explored (2).

Parallel HDF5:

Would require a significant learning curve; fairly complex.

Would require significant code changes.

Feedback from others:

– Difficult to use.

– Limited capability to deal directly with netCDF files.

– Performance issues?

– Fortran interface?

Slide 16, 04/21/23Computing Applications and Research DepartmentComputing Applications and Research Department

Why A/NW’s parallel netCDF was chosen (1).

Expertise, experience, and track record of developers.

PVFS, MPICH, ROMIO.

Parallel netCDF library already in place.

Small number of users; C interface only.

Initial work on Fortran interface completed, but untested.

Interest level in their product seems to be growing rapidly.

Parallel syntax much like the serial syntax.

Slide 17, 04/21/23Computing Applications and Research DepartmentComputing Applications and Research Department

Why A/NW’s parallel netCDF was chosen (2).

Level of support that could be expected over the long-term.

Russ Rew recommendation; based on my needs and time frame.

A/NW developers level of interest and enthusiasm in working

with me.

Belief that A/NW may play a role in the Unidata / NCSA project.

Only practical option currently available?

Slide 18, 04/21/23Computing Applications and Research DepartmentComputing Applications and Research Department

A/NW’s parallel netCDF library.

Based on Unidata’s serial netCDF library.

Syntax and use very much like serial netCDF:

nf_ functions become nfmpi_ functions.

Additional arguments necessary for some calls.

– Create / Open require communicator +

MPI hint (used MPI_INFO_NULL).

Collective functions are suffixed with _all.

netcdf.inc include file becomes pnetcdf.inc.

-lnetcdf library becomes –lpnetcdf.

Slide 19, 04/21/23Computing Applications and Research DepartmentComputing Applications and Research Department

A/NW’s parallel netCDF library: v0.9.0.

First version with a fully functional Fortran interface.

Installation procedure made more user-friendly.

Fortran test routines added.

Interacted extensively with the developers on the above items.

Several LLNL F90 test codes became part of the v0.9.0 release.

The end product seems to meet our needs in terms of

functionality, ease of use, and portability.

Have been told that the first non-beta release will be soon.

Slide 20, 04/21/23Computing Applications and Research DepartmentComputing Applications and Research Department

Platforms used for tests.

MachineType

ComputerCenter:Machine

Name

ProcessorType: Speed(MHz) Nodes

CPUs/Node

Memory/Node(GB)

Parallel File

SystemUsed

Intel / Linux cluster

LLNL:mcr

Pentium4Xeon: 2400

1152 2 4 Lustre/p/gm1

IBM SP NERSC:seaborg

Power3:375

416 16 16-64 GPFS$SCRATCH

Compaq TeraCluster

2000LLNL:tckk

ES40/EV67: 667

128 4 2 CPFS/cpfs

Slide 21, 04/21/23Computing Applications and Research DepartmentComputing Applications and Research Department

Parallel netCDF v0.9.0 installation (1).

Web site => http://www-unix.mcs.anl.gov/parallel-netcdf

Subscribe to the mailing list.

Download:

parallel-netcdf-0.9.0.tar.gz

Parallel NetCDF API documentation.

Note that the following paper will also be coming out soon:

Jianwei Li, Wei-keng Liao, Alok Choudhary, Robert Ross, Rajeev

Thakur, William Gropp, and Rob Latham, “Parallel netCDF: A Scientific

High-Performance I/O Interface”, to appear in the Proceedings of the

15th SuperComputing Conference, November, 2003.

Slide 22, 04/21/23Computing Applications and Research DepartmentComputing Applications and Research Department

Parallel netCDF v0.9.0 installation (2).

Environment Variable mcr seaborg tckk

MPICCMPIF77

F77FCCC

CCX

mpcc_rmpxlf_r

xlfxlfxlcxlC

mpiiccmpiifc

ifcifcicc---

mpiccmpif77

------------

Set the following environment variables:

Uncompress / Untar tar file. Move into the top-level directory. Type:

./configure --prefix=/replace with top-level directory path

make

make install

Slide 23, 04/21/23Computing Applications and Research DepartmentComputing Applications and Research Department

Performance test codes (1).

Test codes written in Fortran 90.

MPI_Wtime used to do the timings.

One large 4D floating point array read or written.

Set up to emulate the basic kind of netCDF I/O that is currently

being done in the Impact model.

Use a Master / Slaves paradigm.

Lon x Lat x Levels x Species I/O array dimensions.

Each Slave only has a portion of the first two dimensions.

Slide 24, 04/21/23Computing Applications and Research DepartmentComputing Applications and Research Department

Performance test codes (2).

Focused on timing the explicit Read / Write calls, along with any

required MPI communication costs.

Typically, Impact files are open for prolonged periods, with large

Read / Writes occurring periodically, then eventually closed.

Not overly concerned with file definition costs; opens / closes, but

kept an eye on them.

Slide 25, 04/21/23Computing Applications and Research DepartmentComputing Applications and Research Department

Serial netCDF performance test code.

Version 3.5 of serial netCDF used.

Slave processors read their own input data.

Slaves use MPI to communicate their output data back to the

Master for output.

Communication cost included for Write timings.

Only Master creates / opens output file.

Timed over a single iteration of Read / Write calls in any given run.

Slide 26, 04/21/23Computing Applications and Research DepartmentComputing Applications and Research Department

Parallel netCDF performance test code.

Version 0.9.0 of A/NW’s parallel netCDF used.

Slave processors do all netCDF I/O (Master idle).

All Slaves create / open output file.

Translation from serial netCDF test code.

Same number of netCDF calls.

Calls are syntactically very similar.

Explicit MPI communications no longer needed for Writes.

Two additional arguments are required for Create / Open:

– Communicator + MPI hint (used MPI_INFO_NULL).

netcdf.inc needs to be changed to pnetcdf.inc.

Timed over 10 iterations of Read / Write calls in any given run.

Slide 27, 04/21/23Computing Applications and Research DepartmentComputing Applications and Research Department

Function calls used in netCDF test codes.

Serial netCDF Calls

Nf_Create

Nf_Open

Nf_Close (2X)

Nf_Def_Dim (4X)

Nf_Def_Var

Nf_Enddef

Nf_Inq_Varid (2X)

Nf_Get_Vara_Real

Nf_Put_Vara_Real

Parallel netCDF Calls

Nfmpi_Create

Nfmpi_Open

Nfmpi_Close (2X)

Nfmpi_Def_Dim (4X)

Nfmpi_Def_Var

Nfmpi_Enddef

Nfmpi_Inq_Varid (2X)

Nfmpi_Get_Vara_Real_All

Nfmpi_Put_Vara_Real_All

Slide 28, 04/21/23Computing Applications and Research DepartmentComputing Applications and Research Department

Parallel netCDF test code compilation.

On mcr:

mpiifc -O3 -xW -tpp7 -Zp16 -ip -cm -w90 -w95 -extend_source \

-I$(HOME)/parallel-netcdf-0.9.0/include -c slvwrt.F

On seaborg:

mpxlf90_r -c -d -I$(HOME)/parallel-netcdf-0.9.0/include -O3 \

-qfixed=132 -qstrict -qarch=auto -qtune=auto \

-qmaxmem=-1 slvwrt.F

On tckk:/usr/bin/f90 -arch host -fast -fpe -assume accuracy_sensitive \

-extend_source -I$(HOME)/parallel-netcdf-0.9.0/include \

-c slvwrt.F

Then link with -lpnetcdf and other libraries as necessary (mpi, etc.).

Slide 29, 04/21/23Computing Applications and Research DepartmentComputing Applications and Research Department

Timing issue.

I/O resources are shared, so getting consistent timings can be

problematic.

More so for some machines (seaborg) than others.

Made many runs and took the best time.

Slide 30, 04/21/23Computing Applications and Research DepartmentComputing Applications and Research Department

I/O timing variables.

Modes Operations Platforms

Number of

Processors

File Size

(MB)

Serial

Parallel

Read

Write

mcr

seaborg

tckk

16

31

64

127

302.4

907.2

1814.4

Number of

Processors

Lon x Lat +

1 for Master

16 5 x 3 + 1

31 5 x 6 + 1

64 9 x 7 + 1

127 9 x 14 + 1

File Size

(MB)

Lon x Lat x

Levels x Species

302.4 180 x84 x 50 x 100

907.2 270 x168 x 50 x 100

1814.4 360 x252 x 50 x 100

Slide 31, 04/21/23Computing Applications and Research DepartmentComputing Applications and Research Department

Serial / Parallel netCDF performance test results for mcr (plots to follow).

Serial netCDF Read / Write Rates (MB/s)

Number of Processors

16 31 64 127

File Size (MB)

302 43 / 40 25 / 39 14 / 39 7 / 48

907 44 / 24 26 / 38 14 / 38 7 / 38

1814 51 / 25 29 / 25 15 / 38 7 / 39

Parallel netCDF Read / Write Rates (MB/s)

Number of Processors

16 31 64 127

File Size (MB)

302 438 / 104 676 / 115 667 / 114 577 / 101

907 488 / 119 771 / 135 949 / 131 863 / 132

1814 520 / 128 820 / 130 1020 / 144 1032 / 136

Slide 32, 04/21/23Computing Applications and Research DepartmentComputing Applications and Research Department

Serial / Parallel netCDF performance test results for seaborg (plots to follow).

Serial netCDF Read / Write Rates (MB/s)

Number of Processors

16 31 64 127

File Size (MB)

302 52 / 25 47 / 25 46 / 23 48 / 21

907 49 / 24 48 / 24 42 / 24 41 / 23

1814 45 / 24 49 / 24 48 / 23 53 / 23

Parallel netCDF Read / Write Rates (MB/s)

Number of Processors

16 31 64 127

File Size (MB)

302 136 / 159 209 / 264 175 / 198 428 / 235

907 121 / 149 273 / 223 268 / 219 271 / 286

1814 114 / 136 255 / 238 278 / 235 350 / 311

Slide 33, 04/21/23Computing Applications and Research DepartmentComputing Applications and Research Department

Serial / Parallel netCDF performance test results for tckk (plots to follow).

Serial netCDF Read / Write Rates (MB/s)

Number of Processors

16 31 64 127

File Size (MB)

302 91 / 30 43 / 26 9 / 15 ---

907 81 / 21 40 / 22 16 / 21 ---

1814 58 / 20 38 / 19 24 / 20 ---

Parallel netCDF Read / Write Rates (MB/s)

Number of Processors

16 31 64 127

File Size (MB)

302 187 / 55 299 / 81 345 / 87 ---

907 194 / 47 322 / 85 392 / 117 ---

1814 198 / 56 329 / 85 417 / 118 ---

Slide 34, 04/21/23Computing Applications and Research DepartmentComputing Applications and Research Department

Serial / Parallel netCDF Read / Write rates.

1548 24

1020

278

417

0

200

400

600

800

1000

1200

mcr seaborg tckk

Platform

Rate(MB/s)

Serial

Parallel

Read

38 23 20

144

235

118

0

200

400

600

800

1000

1200

mcr seaborg tckk

Platform

Rate(MB/s)

Serial

Parallel

Write

1814 MB file64 processors

Slide 35, 04/21/23Computing Applications and Research DepartmentComputing Applications and Research Department

15

48

24

38

2320

0

20

40

60

80

100

120

mcr seaborg tckk

Platform

Rate(MB/s)

Read

Write

Serial

1020

278

417

144

235

118

0

200

400

600

800

1000

1200

mcr seaborg tckk

Platform

Rate(MB/s)

Read

Write

Parallel

1814 MB file64 processors

(Note different y axis scales.)

Read / Write netCDF Serial / Parallel rates.

Slide 36, 04/21/23Computing Applications and Research DepartmentComputing Applications and Research Department

Read netCDF Serial / Parallel rates for varying numbers of processors.

45 49 48 53

114

255278

350

0

200

400

600

800

1000

1200

0 50 100 150Number of Processors

(16,31,64,127)

Serial

Parallel

Read1814 MB file

mcr

5129 15 7

520

820

1020 1032

0

200

400

600

800

1000

1200

0 50 100 150

Rate(MB/s)

58 38 24

198

329

417

0

200

400

600

800

1000

1200

0 50 100 150

seaborg tckk

Slide 37, 04/21/23Computing Applications and Research DepartmentComputing Applications and Research Department

Write netCDF Serial / Parallel rates for varying numbers of processors.

24 24 23 23

136

238 235

311

0

100

200

300

400

0 50 100 150Number of Processors

(16,31,64,127)

Serial

Parallel

Write1814 MB file

mcr

25 2538 39

128 130144

136

0

100

200

300

400

0 50 100 150

Rate(MB/s)

20 19 20

56

85

118

0

100

200

300

400

0 50 100 150

seaborg tckk

Slide 38, 04/21/23Computing Applications and Research DepartmentComputing Applications and Research Department

Read netCDF Serial / Parallel rates for varying file sizes.

46 42 48

175

268 278

0

200

400

600

800

1000

1200

302 907 1814

File Size (MB)

Serial

Parallel

Read64 processors

mcr

14 14 15

667

949

1020

0

200

400

600

800

1000

1200

302 907 1814

Rate(MB/s)

9 16 24

345392

417

0

200

400

600

800

1000

1200

302 907 1814

seaborg tckk

Slide 39, 04/21/23Computing Applications and Research DepartmentComputing Applications and Research Department

Write netCDF Serial / Parallel rates for varying file sizes.

23 24 23

198

219235

0

100

200

300

400

302 907 1814

File Size (MB)

Serial

Parallel

Write64 processors

mcr

39 38 38

114131

144

0

100

200

300

400

302 907 1814

Rate(MB/s)

15 21 20

87

117 118

0

100

200

300

400

302 907 1814

seaborg tckk

Slide 40, 04/21/23Computing Applications and Research DepartmentComputing Applications and Research Department

Parallel HDF5 performance test code.

Version 1.4.5 of NCSA’s parallel HDF5 used.

Slave processors do all HDF5 I/O (Master idle).

Collaborated with Livermore Computing personnel to convert the

parallel netCDF test code to its parallel HDF5 equivalent.

Conversion seemed to take a good deal of effort.

Increase in code complexity over parallel netCDF.

Great deal of difficulty in getting test code compiled and linked.

Irresolvable problems with parallel HDF5 library on mcr and tckk.

Finally got things working on seaborg.

Made a limited number of timing runs for a “ballpark” comparison.

Slide 41, 04/21/23Computing Applications and Research DepartmentComputing Applications and Research Department

Function calls used in parallel HDF5 test code.

Parallel HDF5 Calls

h5dcreate_f

h5fcreate_f

h5pcreate_f

(3X)

h5screate_simple_f

(2X)

h5dget_space_f

h5sselect_hyperslab_f

h5open_f

h5dopen_f

h5fopen_f

h5pset_chunk_f

h5pset_dxpl_mpio_f

h5pset_fapl_mpio_f

h5close_f

h5dclose_f

h5fclose_f

h5pclose_f

(2X)

h5sclose_f

(2X)

h5dread_f

h5dwrite_f

Slide 42, 04/21/23Computing Applications and Research DepartmentComputing Applications and Research Department

Parallel HDF5 test code compilation.

On seaborg:

module load hdf5_par

mpxlf90_r -c -d -O3 -qfixed=132 -qstrict -qarch=auto \

-qtune=auto -qmaxmem=-1 slvwrt_hdf5.F $(HDF5)

Add $(HDF5) at the end of your link line as well.

Slide 43, 04/21/23Computing Applications and Research DepartmentComputing Applications and Research Department

Parallel HDF5 / netCDF performance test results for seaborg (plot to follow).

Parallel HDF5 Read / Write Rates (MB/s)

Number of Processors

16 31 64 127

File Size (MB)

302 --- --- --- ---

907 --- 631 / 838 1016 / 1459 ---

1814 --- --- 1182 / 1465 ---

Parallel netCDF Read / Write Rates (MB/s)

Number of Processors

16 31 64 127

File Size (MB)

302 136 / 159 209 / 264 175 / 198 428 / 235

907 121 / 149 273 / 223 268 / 219 271 / 286

1814 114 / 136 255 / 238 278 / 235 350 / 311

Slide 44, 04/21/23Computing Applications and Research DepartmentComputing Applications and Research Department

Parallel HDF5 / netCDF Read / Write rates.

1182

278

0

200

400

600

800

1000

1200

1400

1600

mcr seaborg tckk

Platform

Rate(MB/s)

PHDF5

PnetCDF

Read1465

235

0

200

400

600

800

1000

1200

1400

1600

mcr seaborg tckk

Platform

Rate(MB/s)

PHDF5

PnetCDF

Write

1814 MB file64 processors

Slide 45, 04/21/23Computing Applications and Research DepartmentComputing Applications and Research Department

Observations.

Parallel netCDF seems to be a very hot topic right now.

Since A/NW’s parallel netCDF is functionally and syntactically very

similar to serial netCDF, code conversion is pretty straightforward.

I/O speeds can vary significantly machine to machine.

I/O speeds can vary significantly on the same machine, based on

the I/O load at any given time.

Slide 46, 04/21/23Computing Applications and Research DepartmentComputing Applications and Research Department

Misc. conclusions from plots.

Our current method of doing serial netCDF Slave Reads performed quite poorly in general. Unexpected. Can degrade significantly as number of processors are increased.

Parallel netCDF Reads are faster than Writes. Magnitude of difference on a given platform can vary dramatically.

mcr marches to its own netCDF drummer. Parallel Reads are quite fast; serial Reads are not. Serial Writes faster than Reads. Parallel Writes scale poorly.

Parallel netCDF I/O tends to get somewhat faster as the file size increases.

Different platforms can behave very differently!

Slide 47, 04/21/23Computing Applications and Research DepartmentComputing Applications and Research Department

Overall Conclusions.

Under the specified test conditions:

A/NW’s parallel netCDF (v0.9.0) performed significantly better than serial netCDF (v3.5).

Under the specified test conditions and limited testing:

Parallel HDF5 (v1.4.5) performed significantly better than A/NW’s parallel netCDF (v0.9.0).

– To date, A/NW focus has been on functionality, not performance; they believe that there is substantial room for improvement.

– On a different platform and code, A/NW developers have found that parallel netCDF significantly outperforms parallel HDF5.

– Not a simple matter of one being faster than the other, platform and access patterns may favor one or the other.

Slide 48, 04/21/23Computing Applications and Research DepartmentComputing Applications and Research Department

Remaining questions / issues.

What about files larger than 2 GB?

It appears that a general netCDF solution may be forthcoming.

How much will A/NW be able to improve performance?

They are committed to working this issue.

When will the first A/NW non-beta release be?

Maybe early next year, after performance issues are addressed.

What will the outcome of the Unidata / NCSA project be?

What role will A/NW play?

Have any potential show stoppers been missed?

Will we incorporate A/NW’s parallel netCDF capability into our

Impact model?