Re: different size netCDF files generated from NCL on different machines

From: Wei Huang <huangwei_at_nyahnyahspammersnyahnyah>
Date: Thu Jun 19 2014 - 14:48:27 MDT

Chunk size will definitely impact the file size, and compression too.
If chunk size is bigger than the real data size, the file will be end up
with the chunk size as the file system has to allocate enough space
for the specified chunk. If chunk size is too small, the real data will
be stored in many small chunks, which causes file size larger as
the overhead of storing the chunk info.

This case help all of us to understand the chunking better.

Regards,

Wei


On Thu, Jun 19, 2014 at 1:47 PM, Dave Allured - NOAA Affiliate <
dave.allured@noaa.gov> wrote:

> Marc,
>
> Please include the user list in all replies.
>
> I am glad you found the discrepancy. When first creating a Netcdf-4 file
> in NCL, the user may optionally set chunk sizes with the function
> filevarchunkdef.
>
> If you don't set your own chunk sizes, I think NCL has a built in method
> to compute default chunk sizes. That might explain your current results.
> Perhaps NCL support could explain this part.
>
> --Dave
>
> On Thu, Jun 19, 2014 at 12:18 PM, Marcella, Marc <
> MMarcella@air-worldwide.com> wrote:
>
>> Hi Dave,
>>
>>
>> Thank you for the email back. I was about to reply back when I did find=
,
>> thanks to your help, a difference after ncdump –sh….the =
chunksizes.
>>
>>
>>
>> On the machine with the smaller file size all of the variables read like
>> this example one, T2:
>>
>> float T2(Time, south_north, west_east) ;
>>
>> T2:_FillValue = 9.96921e+36f ;
>>
>> T2:units = "K" ;
>>
>> T2:description = "Temperature at 2 m" ;
>>
>> T2:_Storage = "chunked" ;
>>
>> T2:_ChunkSizes = 957, 21, 12 ;
>>
>> T2:_DeflateLevel = 5 ;
>>
>> T2:_Shuffle = "true" ;
>>
>>
>>
>> But on the machine with the larger file size, the same variable (and all
>> others) has a larger chunksize (957, 21, 12 vs. 1752, 18,25):
>>
>> float T2(Time, south_north, west_east) ;
>>
>> T2:_FillValue = 9.96921e+36f ;
>>
>> T2:units = "K" ;
>>
>> T2:description = "Temperature at 2 m" ;
>>
>> T2:_Storage = "chunked" ;
>>
>> T2:_ChunkSizes = 1752, 18, 25 ;
>>
>> T2:_DeflateLevel = 5 ;
>>
>> T2:_Shuffle = "true" ;
>>
>>
>>
>> Im assuming this is the problem. I started to read up on chunksize but
>> thought I would email you back to see if you think this would in effect =
be
>> causing the difference. And, it seems like I can simply specify the
>> chunksize somewhere (guessing its in the netcdf libraries). Anyhow, wan=
ted
>> to get your input on it. Thanks again for your help with this, please d=
o
>> let me know!
>>
>>
>>
>> -Marc
>>
>>
>>
>>
>>
>> Marc,
>>
>>
>>
>> An important clue is where you said different results when run on comman=
d
>> line versus shell script. You might be launching a different NCL versio=
n
>> through the shell script. Try ncl -V (display version number) on comman=
d
>> line and in script.
>>
>>
>>
>> Look for other differences. First compare ncdump -k between differing
>> files, just in case your format assumption is wrong. Then compare ncdum=
p
>> -hs, line by line. In particular, compare dimension sizes and compressi=
on
>> parameters. Also look for missing variables.
>>
>>
>>
>> Check for unexpected program abort for the process that makes the smalle=
r
>> file. It is possible that you have a partially written file that is sti=
ll
>> valid Netcdf4/HDF5, but partially empty data. Such file could possibly
>> match in terms of variables and dimension size, but be physically smalle=
r.
>>
>>
>>
>> --Dave
>>
>>
>>
>> On Wed, Jun 18, 2014 at 2:45 PM, Marcella, Marc <
>> MMarcella@air-worldwide.com> wrote:
>>
>> Hi all,
>>
>>
>>
>> I am finding a peculiar instance that when an identical netcdf file is
>> written via NCL (NetCDF 4 Classic, compression level 5,) on one machine =
the
>> file is 5.6GB but on the other computer its size is 9GB. To the best of=
 my
>> knowledge, the files are identical when opened (type, values, etc.) Wha=
t
>> is odder is that if I execute the NCL script (and ensure they are using =
the
>> same version of NCL) from the command line instead of from the shell
>> script, the file sizes are identical.
>>
>>
>>
>> Are there certain checks I am missing that I should try? I believe both
>> machines are using the same NetCDF libraries, but could that be the issu=
e?
>> Anyhow, I am somewhat stumped and believe it must be a machine environm=
ent
>> issue difference when both machines execute the shell script that calls =
the
>> NCL script to generate the netcdf. Any help you could lend would be
>> greatly appreciated…thanks!
>>
>>
>>
>> -Marc
>>
>>
>>
>
> _______________________________________________
> ncl-talk mailing list
> List instructions, subscriber options, unsubscribe:
> http://mailman.ucar.edu/mailman/listinfo/ncl-talk
>
>

Received on Thu Jun 19 08:56:05 2014

This archive was generated by hypermail 2.1.8 : Wed Jul 23 2014 - 15:33:46 MDT