Re: different size netCDF files generated from NCL on different machines

From: Wei Huang <huangwei_at_nyahnyahspammersnyahnyah>
Date: Thu Jun 19 2014 - 16:22:22 MDT

Marc,

As you mentioned in your email, filechunkdimdef is how to define chunksize
(file-wise). see:
http://www.ncl.ucar.edu/Document/Functions/Built-in/filechunkdimdef.shtml
filevarchunkdef is to define chunksize for a varaible, see:
http://www.ncl.ucar.edu/Document/Functions/Built-in/filevarchunkdef.shtml

Another big factor is compression. For variable compression see:
http://www.ncl.ucar.edu/Document/Functions/Built-in/filevarcompressleveldef=
.shtml
File-wise compressions, also shuffle can play a role as well, see:
http://www.ncl.ucar.edu/Document/Functions/Built-in/setfileoption.shtml
(goto CompressionLevel, and Shuffle)

Wei




On Thu, Jun 19, 2014 at 2:55 PM, Marcella, Marc <MMarcella@air-worldwide.co=
m
> wrote:

> Thanks Wei. I cant seem to get specifying the actual chunk size to work
> via the filevarchunkdef or filechunkdimdef. My file dimensions are time,
> lat, lon at 8760,106,123.
>
> For NCL6.0 this makes a 5.3GB file at chunksizes 957,21,12 whereas on
> NCL6.1.2 this makes a 9.0GB file at chunksizes 1752,18,25.
>
> I cant really seem to find a “scale factor” that is consi=
stent across
> dimensions or NCL versions that converts from the dimsizes to the chunks.
> Only the 8760 to 1752 seems to have a neat value of (1/5).
>
> Is there somewhere in the NCL code or libraries that I can specify/set th=
e
> chunk sizes so I may see if this is indeed what is causing the difference=
s
> between the two file sizes?
>
>
>
> Thanks,
>
> -Marc
>
>
>
>
>
> *From:* Wei Huang [mailto:huangwei@ucar.edu]
> *Sent:* Thursday, June 19, 2014 4:48 PM
> *To:* Dave Allured - NOAA Affiliate
> *Cc:* Marcella, Marc; ncl-talk@ucar.edu
> *Subject:* Re: [ncl-talk] different size netCDF files generated from NCL
> on different machines
>
>
>
> Chunk size will definitely impact the file size, and compression too.
>
> If chunk size is bigger than the real data size, the file will be end up
>
> with the chunk size as the file system has to allocate enough space
>
> for the specified chunk. If chunk size is too small, the real data will
>
> be stored in many small chunks, which causes file size larger as
>
> the overhead of storing the chunk info.
>
>
>
> This case help all of us to understand the chunking better.
>
>
>
> Regards,
>
>
>
> Wei
>
>
>
> On Thu, Jun 19, 2014 at 1:47 PM, Dave Allured - NOAA Affiliate <
> dave.allured@noaa.gov> wrote:
>
> Marc,
>
>
>
> Please include the user list in all replies.
>
>
> I am glad you found the discrepancy. When first creating a Netcdf-4 file
> in NCL, the user may optionally set chunk sizes with the function
> filevarchunkdef.
>
>
>
> If you don't set your own chunk sizes, I think NCL has a built in method
> to compute default chunk sizes. That might explain your current results.
> Perhaps NCL support could explain this part.
>
>
>
> --Dave
>
> On Thu, Jun 19, 2014 at 12:18 PM, Marcella, Marc <
> MMarcella@air-worldwide.com> wrote:
>
> Hi Dave,
>
>
> Thank you for the email back. I was about to reply back when I did find,
> thanks to your help, a difference after ncdump –sh….the c=
hunksizes.
>
>
>
> On the machine with the smaller file size all of the variables read like
> this example one, T2:
>
> float T2(Time, south_north, west_east) ;
>
> T2:_FillValue = 9.96921e+36f ;
>
> T2:units = "K" ;
>
> T2:description = "Temperature at 2 m" ;
>
> T2:_Storage = "chunked" ;
>
> T2:_ChunkSizes = 957, 21, 12 ;
>
> T2:_DeflateLevel = 5 ;
>
> T2:_Shuffle = "true" ;
>
>
>
> But on the machine with the larger file size, the same variable (and all
> others) has a larger chunksize (957, 21, 12 vs. 1752, 18,25):
>
> float T2(Time, south_north, west_east) ;
>
> T2:_FillValue = 9.96921e+36f ;
>
> T2:units = "K" ;
>
> T2:description = "Temperature at 2 m" ;
>
> T2:_Storage = "chunked" ;
>
> T2:_ChunkSizes = 1752, 18, 25 ;
>
> T2:_DeflateLevel = 5 ;
>
> T2:_Shuffle = "true" ;
>
>
>
> Im assuming this is the problem. I started to read up on chunksize but
> thought I would email you back to see if you think this would in effect b=
e
> causing the difference. And, it seems like I can simply specify the
> chunksize somewhere (guessing its in the netcdf libraries). Anyhow, want=
ed
> to get your input on it. Thanks again for your help with this, please do
> let me know!
>
>
>
> -Marc
>
>
>
>
>
> Marc,
>
>
>
> An important clue is where you said different results when run on command
> line versus shell script. You might be launching a different NCL version
> through the shell script. Try ncl -V (display version number) on command
> line and in script.
>
>
>
> Look for other differences. First compare ncdump -k between differing
> files, just in case your format assumption is wrong. Then compare ncdump
> -hs, line by line. In particular, compare dimension sizes and compressio=
n
> parameters. Also look for missing variables.
>
>
>
> Check for unexpected program abort for the process that makes the smaller
> file. It is possible that you have a partially written file that is stil=
l
> valid Netcdf4/HDF5, but partially empty data. Such file could possibly
> match in terms of variables and dimension size, but be physically smaller=
.
>
>
>
> --Dave
>
>
>
> On Wed, Jun 18, 2014 at 2:45 PM, Marcella, Marc <
> MMarcella@air-worldwide.com> wrote:
>
> Hi all,
>
>
>
> I am finding a peculiar instance that when an identical netcdf file is
> written via NCL (NetCDF 4 Classic, compression level 5,) on one machine t=
he
> file is 5.6GB but on the other computer its size is 9GB. To the best of =
my
> knowledge, the files are identical when opened (type, values, etc.) What
> is odder is that if I execute the NCL script (and ensure they are using t=
he
> same version of NCL) from the command line instead of from the shell
> script, the file sizes are identical.
>
>
>
> Are there certain checks I am missing that I should try? I believe both
> machines are using the same NetCDF libraries, but could that be the issue=
?
> Anyhow, I am somewhat stumped and believe it must be a machine environme=
nt
> issue difference when both machines execute the shell script that calls t=
he
> NCL script to generate the netcdf. Any help you could lend would be
> greatly appreciated…thanks!
>
>
>
> -Marc
>
>
>
>
> _______________________________________________
> ncl-talk mailing list
> List instructions, subscriber options, unsubscribe:
> http://mailman.ucar.edu/mailman/listinfo/ncl-talk
>
>
>

Received on Thu Jun 19 10:23:05 2014

This archive was generated by hypermail 2.1.8 : Wed Jul 23 2014 - 15:33:46 MDT