Re: large netCDF file writing

From: Josh Hacker <hacker_at_nyahnyahspammersnyahnyah>
Date: Wed, 18 Oct 2006 10:02:13 -0600


You are pretty much on (clarified below). Although it is a crazy thing
to try and deal with files this big, I did not see a way around it for
my particular purpose. I have since tried something different that
gives me the efficiency I need.

Of course the trade-off is always computations versus storage, and I was
opting for the storage in this case in the hopes of minimizing future

The ultimate goal is to write "patched" WRF restart files (one per
processor) for thousands of processors. The approach is needed to avoid
the memory footprint associated with a single processor allocating a
huge domain.

This requires assembling part or all of a variable from some patched
files, say 1024, and then writing them out in a different configuration,
say 4096. The assembling takes quite a while, so I wanted to do it only
once and write to a big file. I can then read from the big file and rip
it up accordingly. Instead, I settled on binary write/reads to/from
individual files, each containing a single variable array. I should
have thought of this a while ago but for whatever reason...

The netCDF operator ncks has a possible bug on 64-bit machines, that
results in a seg fault when a hyperslab is asked for, so I couldn't use

Ultimately, parallel hdf5 is the solution to the problem because then
the WRF can deal with only one file while not using too much memory, but
it is not yet ported to the platform I am trying to use.

If it is still useful, see below for clarification...

Dennis Shea wrote:
>>I want to write arrays to a large file without waiting until next week.
> ===
> Of course, as the Unidata people themselves say,
> "netCDF software is designed for robustness and flexibility,
> not efficiency"
> ===
>>Although this may be a crazy thing to do, I have a good reason to try.
>>The file size is about 46G, and I am writing 2250x2250 2D arrays and
>>2250x2250x100 3D arrays.
>>The file has already been created with ncgen. The dimensions and
>>variables are thus pre-defined. No attributes are written, e.g. (/*/).
>>Although I didn't believe most of these would work, I have tried the
>>following with no noticeable difference:
>>1. Reversing the order of my variable list, which I am looping through
>>as I write each variable.
> ===
> If the file is "predefined", then the underlying Unidata
> software is [I speculate] using 'fseek' to go to a specific
> predefined file location. It seems to me the write order
> should not matter [much].
> ===
>>2. Setting the file option "PreFill" to false.
> ===
> Yes, this should speed the file creation process.
> However, has already been done in the "ncgen -x" process?
> ===
>>3. Copying my output array, which is a subset of an internal NCL
>>array, to a temporary variable for write.
>>Thoughts, ideas?
>>versions and platform:
>>NCAR Command Language Version 4.2.0.a033
>>netCDF 3.6.0
>>uname -m: x86_64
> I am sure I am not understanding this correctly. Pls clarify.
> My understanding is:
> [1] You have used "ncgen" to create a file template based
> on a CDL file. Using "ncgen -x" [ie: no prefill] results
> in much faster file creation.
> [a] Is this done independent of NCL? Eg, Invoking ncgen
> from the command line?
> ncgen -x .....

Command line.

> [b] You are invoking ncgen from within an NCL script
> via the 'system' command. Hence, components
> of the ascii CDL file used by ncgen are generated within
> an NCL script.
> [2] You are the opening the file template created in [1] in NCL
> with the "w" option on addfile
> fout = addfile("", "w")
> I speculate this will take time to open.
> NCL is just invoking the underlying Unidata software.

Yes, although the open does not seem to take long.

> [3] Now you want to write to the file:
> fout->A(nl,:,:) = (/ a /) ; a(2250,2250)
> or maybe
> fout->B = (/ b /) ; b(100,2250,2250)

You got it.

As always, thanks for the reply.


Joshua Hacker
Research Applications Laboratory
voice: 303-497-8188
fax: 303-497-8401
ncl-talk mailing list
Received on Wed Oct 18 2006 - 10:02:13 MDT

This archive was generated by hypermail 2.2.0 : Sat Oct 21 2006 - 07:29:03 MDT