Re: different size netCDF files generated from NCL on different machines

From: Marcella, Marc <MMarcella_at_nyahnyahspammersnyahnyah>
Date: Thu Jun 19 2014 - 14:55:44 MDT

Thanks Wei. I cant seem to get specifying the actual chunk size to work via the filevarchunkdef or filechunkdimdef. My file dimensions are time, lat, lon at 8760,106,123.
For NCL6.0 this makes a 5.3GB file at chunksizes 957,21,12 whereas on NCL6.1.2 this makes a 9.0GB file at chunksizes 1752,18,25.
I cant really seem to find a “scale factor” that is consistent across dimensions or NCL versions that converts from the dimsizes to the chunks. Only the 8760 to 1752 seems to have a neat value of (1/5).
Is there somewhere in the NCL code or libraries that I can specify/set the chunk sizes so I may see if this is indeed what is causing the differences between the two file sizes?

Thanks,
-Marc


From: Wei Huang [mailto:huangwei@ucar.edu]
Sent: Thursday, June 19, 2014 4:48 PM
To: Dave Allured - NOAA Affiliate
Cc: Marcella, Marc; ncl-talk@ucar.edu
Subject: Re: [ncl-talk] different size netCDF files generated from NCL on different machines

Chunk size will definitely impact the file size, and compression too.
If chunk size is bigger than the real data size, the file will be end up
with the chunk size as the file system has to allocate enough space
for the specified chunk. If chunk size is too small, the real data will
be stored in many small chunks, which causes file size larger as
the overhead of storing the chunk info.

This case help all of us to understand the chunking better.

Regards,

Wei

On Thu, Jun 19, 2014 at 1:47 PM, Dave Allured - NOAA Affiliate <dave.allured@noaa.gov<mailto:dave.allured@noaa.gov>> wrote:
Marc,

Please include the user list in all replies.

I am glad you found the discrepancy. When first creating a Netcdf-4 file in NCL, the user may optionally set chunk sizes with the function filevarchunkdef.

If you don't set your own chunk sizes, I think NCL has a built in method to compute default chunk sizes. That might explain your current results. Perhaps NCL support could explain this part.

--Dave
On Thu, Jun 19, 2014 at 12:18 PM, Marcella, Marc <MMarcella@air-worldwide.com<mailto:MMarcella@air-worldwide.com>> wrote:
Hi Dave,

Thank you for the email back. I was about to reply back when I did find, thanks to your help, a difference after ncdump –sh….the chunksizes.

On the machine with the smaller file size all of the variables read like this example one, T2:
               float T2(Time, south_north, west_east) ;
                              T2:_FillValue = 9.96921e+36f ;
                              T2:units = "K" ;
                              T2:description = "Temperature at 2 m" ;
                              T2:_Storage = "chunked" ;
                              T2:_ChunkSizes = 957, 21, 12 ;
                              T2:_DeflateLevel = 5 ;
                              T2:_Shuffle = "true" ;

But on the machine with the larger file size, the same variable (and all others) has a larger chunksize (957, 21, 12 vs. 1752, 18,25):
               float T2(Time, south_north, west_east) ;
                              T2:_FillValue = 9.96921e+36f ;
                              T2:units = "K" ;
                              T2:description = "Temperature at 2 m" ;
                              T2:_Storage = "chunked" ;
                              T2:_ChunkSizes = 1752, 18, 25 ;
                              T2:_DeflateLevel = 5 ;
                              T2:_Shuffle = "true" ;

Im assuming this is the problem. I started to read up on chunksize but thought I would email you back to see if you think this would in effect be causing the difference. And, it seems like I can simply specify the chunksize somewhere (guessing its in the netcdf libraries). Anyhow, wanted to get your input on it. Thanks again for your help with this, please do let me know!

-Marc


Marc,

An important clue is where you said different results when run on command line versus shell script. You might be launching a different NCL version through the shell script. Try ncl -V (display version number) on command line and in script.

Look for other differences. First compare ncdump -k between differing files, just in case your format assumption is wrong. Then compare ncdump -hs, line by line. In particular, compare dimension sizes and compression parameters. Also look for missing variables.

Check for unexpected program abort for the process that makes the smaller file. It is possible that you have a partially written file that is still valid Netcdf4/HDF5, but partially empty data. Such file could possibly match in terms of variables and dimension size, but be physically smaller.

--Dave

On Wed, Jun 18, 2014 at 2:45 PM, Marcella, Marc <MMarcella@air-worldwide.com<mailto:MMarcella@air-worldwide.com>> wrote:
Hi all,

I am finding a peculiar instance that when an identical netcdf file is written via NCL (NetCDF 4 Classic, compression level 5,) on one machine the file is 5.6GB but on the other computer its size is 9GB. To the best of my knowledge, the files are identical when opened (type, values, etc.) What is odder is that if I execute the NCL script (and ensure they are using the same version of NCL) from the command line instead of from the shell script, the file sizes are identical.

Are there certain checks I am missing that I should try? I believe both machines are using the same NetCDF libraries, but could that be the issue? Anyhow, I am somewhat stumped and believe it must be a machine environment issue difference when both machines execute the shell script that calls the NCL script to generate the netcdf. Any help you could lend would be greatly appreciated…thanks!

-Marc


_______________________________________________
ncl-talk mailing list
List instructions, subscriber options, unsubscribe:
http://mailman.ucar.edu/mailman/listinfo/ncl-talk


_______________________________________________
ncl-talk mailing list
List instructions, subscriber options, unsubscribe:
http://mailman.ucar.edu/mailman/listinfo/ncl-talk


Received on Thu Jun 19 08:55:51 2014

This archive was generated by hypermail 2.1.8 : Wed Jul 23 2014 - 15:33:46 MDT