Re: different size netCDF files generated from NCL on different machines

From: Dave Allured - NOAA Affiliate <dave.allured_at_nyahnyahspammersnyahnyah>
Date: Thu Jun 19 2014 - 16:58:02 MDT

Marc,

You should also look into the possibility of bad fit of the default chunk
sizes. I am suspicious because of the excessive discrepancy, 5.3 Gb versus
9.0 Gb. Simple "tuning" should not create a difference this large.

What are your actual time, lat, and lon dimension sizes in this file?

--Dave


On Thu, Jun 19, 2014 at 4:22 PM, Wei Huang <huangwei@ucar.edu> wrote:

> Marc,
>
> As you mentioned in your email, filechunkdimdef is how to define
> chunksize (file-wise). see:
> http://www.ncl.ucar.edu/Document/Functions/Built-in/filechunkdimdef.shtml
> filevarchunkdef is to define chunksize for a varaible, see:
> http://www.ncl.ucar.edu/Document/Functions/Built-in/filevarchunkdef.shtml
>
> Another big factor is compression. For variable compression see:
>
> http://www.ncl.ucar.edu/Document/Functions/Built-in/filevarcompressleveld=
ef.shtml
> File-wise compressions, also shuffle can play a role as well, see:
> http://www.ncl.ucar.edu/Document/Functions/Built-in/setfileoption.shtml
> (goto CompressionLevel, and Shuffle)
>
> Wei
>
>
> On Thu, Jun 19, 2014 at 2:55 PM, Marcella, Marc <
> MMarcella@air-worldwide.com> wrote:
>
>> Thanks Wei. I cant seem to get specifying the actual chunk size to
>> work via the filevarchunkdef or filechunkdimdef. My file dimensions are
>> time, lat, lon at 8760,106,123.
>>
>> For NCL6.0 this makes a 5.3GB file at chunksizes 957,21,12 whereas on
>> NCL6.1.2 this makes a 9.0GB file at chunksizes 1752,18,25.
>>
>> I cant really seem to find a “scale factor” that is cons=
istent across
>> dimensions or NCL versions that converts from the dimsizes to the chunks=
.
>> Only the 8760 to 1752 seems to have a neat value of (1/5).
>>
>> Is there somewhere in the NCL code or libraries that I can specify/set
>> the chunk sizes so I may see if this is indeed what is causing the
>> differences between the two file sizes?
>>
>>
>>
>> Thanks,
>>
>> -Marc
>>
>>
>>
>> *From:* Wei Huang [mailto:huangwei@ucar.edu]
>> *Sent:* Thursday, June 19, 2014 4:48 PM
>> *To:* Dave Allured - NOAA Affiliate
>> *Cc:* Marcella, Marc; ncl-talk@ucar.edu
>> *Subject:* Re: [ncl-talk] different size netCDF files generated from NCL
>> on different machines
>>
>>
>>
>> Chunk size will definitely impact the file size, and compression too.
>>
>> If chunk size is bigger than the real data size, the file will be end up
>>
>> with the chunk size as the file system has to allocate enough space
>>
>> for the specified chunk. If chunk size is too small, the real data will
>>
>> be stored in many small chunks, which causes file size larger as
>>
>> the overhead of storing the chunk info.
>>
>>
>>
>> This case help all of us to understand the chunking better.
>>
>>
>>
>> Regards,
>>
>>
>>
>> Wei
>>
>>
>>
>> On Thu, Jun 19, 2014 at 1:47 PM, Dave Allured - NOAA Affiliate <
>> dave.allured@noaa.gov> wrote:
>>
>> Marc,
>>
>>
>>
>> Please include the user list in all replies.
>>
>>
>> I am glad you found the discrepancy. When first creating a Netcdf-4 fil=
e
>> in NCL, the user may optionally set chunk sizes with the function
>> filevarchunkdef.
>>
>>
>>
>> If you don't set your own chunk sizes, I think NCL has a built in method
>> to compute default chunk sizes. That might explain your current results=
.
>> Perhaps NCL support could explain this part.
>>
>>
>>
>> --Dave
>>
>> On Thu, Jun 19, 2014 at 12:18 PM, Marcella, Marc <
>> MMarcella@air-worldwide.com> wrote:
>>
>> Hi Dave,
>>
>>
>> Thank you for the email back. I was about to reply back when I did find=
,
>> thanks to your help, a difference after ncdump –sh….the =
chunksizes.
>>
>>
>>
>> On the machine with the smaller file size all of the variables read like
>> this example one, T2:
>>
>> float T2(Time, south_north, west_east) ;
>>
>> T2:_FillValue = 9.96921e+36f ;
>>
>> T2:units = "K" ;
>>
>> T2:description = "Temperature at 2 m" ;
>>
>> T2:_Storage = "chunked" ;
>>
>> T2:_ChunkSizes = 957, 21, 12 ;
>>
>> T2:_DeflateLevel = 5 ;
>>
>> T2:_Shuffle = "true" ;
>>
>>
>>
>> But on the machine with the larger file size, the same variable (and all
>> others) has a larger chunksize (957, 21, 12 vs. 1752, 18,25):
>>
>> float T2(Time, south_north, west_east) ;
>>
>> T2:_FillValue = 9.96921e+36f ;
>>
>> T2:units = "K" ;
>>
>> T2:description = "Temperature at 2 m" ;
>>
>> T2:_Storage = "chunked" ;
>>
>> T2:_ChunkSizes = 1752, 18, 25 ;
>>
>> T2:_DeflateLevel = 5 ;
>>
>> T2:_Shuffle = "true" ;
>>
>>
>>
>> Im assuming this is the problem. I started to read up on chunksize but
>> thought I would email you back to see if you think this would in effect =
be
>> causing the difference. And, it seems like I can simply specify the
>> chunksize somewhere (guessing its in the netcdf libraries). Anyhow, wan=
ted
>> to get your input on it. Thanks again for your help with this, please d=
o
>> let me know!
>>
>>
>>
>> -Marc
>>
>>
>>
>>
>>
>> Marc,
>>
>>
>>
>> An important clue is where you said different results when run on comman=
d
>> line versus shell script. You might be launching a different NCL versio=
n
>> through the shell script. Try ncl -V (display version number) on comman=
d
>> line and in script.
>>
>>
>>
>> Look for other differences. First compare ncdump -k between differing
>> files, just in case your format assumption is wrong. Then compare ncdum=
p
>> -hs, line by line. In particular, compare dimension sizes and compressi=
on
>> parameters. Also look for missing variables.
>>
>>
>>
>> Check for unexpected program abort for the process that makes the smalle=
r
>> file. It is possible that you have a partially written file that is sti=
ll
>> valid Netcdf4/HDF5, but partially empty data. Such file could possibly
>> match in terms of variables and dimension size, but be physically smalle=
r.
>>
>>
>>
>> --Dave
>>
>>
>>
>> On Wed, Jun 18, 2014 at 2:45 PM, Marcella, Marc <
>> MMarcella@air-worldwide.com> wrote:
>>
>> Hi all,
>>
>>
>>
>> I am finding a peculiar instance that when an identical netcdf file is
>> written via NCL (NetCDF 4 Classic, compression level 5,) on one machine =
the
>> file is 5.6GB but on the other computer its size is 9GB. To the best of=
 my
>> knowledge, the files are identical when opened (type, values, etc.) Wha=
t
>> is odder is that if I execute the NCL script (and ensure they are using =
the
>> same version of NCL) from the command line instead of from the shell
>> script, the file sizes are identical.
>>
>>
>>
>> Are there certain checks I am missing that I should try? I believe both
>> machines are using the same NetCDF libraries, but could that be the issu=
e?
>> Anyhow, I am somewhat stumped and believe it must be a machine environm=
ent
>> issue difference when both machines execute the shell script that calls =
the
>> NCL script to generate the netcdf. Any help you could lend would be
>> greatly appreciated…thanks!
>>
>>
>>
>> -Marc
>>
>>
>>
>>
>> _______________________________________________
>> ncl-talk mailing list
>> List instructions, subscriber options, unsubscribe:
>> http://mailman.ucar.edu/mailman/listinfo/ncl-talk
>>
>>
>>
>
>

Received on Thu Jun 19 10:58:14 2014

This archive was generated by hypermail 2.1.8 : Wed Jul 23 2014 - 15:33:46 MDT