Re: the addfile command is very slow

From: Dennis Shea <shea_at_nyahnyahspammersnyahnyah>
Date: Tue Oct 19 2010 - 09:27:27 MDT

Again, I'm sure a core developer will reply later today.

re:
A "ncdump -h" command from
the same machine does not take more than a split second.

===
A netCDF file is written with all the dimension names, sizes,
variable names, meta data, etc at the very 'top of the file'.
The "ncdump -h" is just reading that information which is
trivial. That is what you see when you do the ncdump -h.
It 'ncdump -h' is not reading any values associated
with the variables.

On 10/19/10 9:17 AM, Bjoern Maronga wrote:
> Thanks for the reply. I'm working on a supercomputer node, the data is located
> on its dataserver. So I am indeed on a multi-user server, but the problem
> does not arise when requesting the memory! It comes directly due
> to "addfile", which should not request any memory, right?
>
> I added the wallClockElapseTime commands to my script, but I got the weird
> message:
>
> (0) wallClockElapseTime: something wrong: no printed value
>
> This happens here also for data, which is added by addfile in a second. Seems
> not to be related to the problem.
>
> However, I added two systemfunc("date") commands before and after "addfile" as
> well as when using the pointers to load data into arrays. Some code snippets
> follow:
>
> wcStrt = systemfunc("date")
> print(wcStrt)
> cdf_file = addfile(full_filename,"r")
> wcStrt2 = systemfunc("date")
> print(wcStrt2)
> wallClockElapseTime(wcStrt, "addfile", 0)
>
> [...]
>
> wcStrt = systemfunc("date")
> print(wcStrt)
> field_ts = cdf_file->$struc_pars$(ts:te,0,ys:ye,xs:xe)
> printVarSummary(field_ts)
> wcStrt2 = systemfunc("date")
> print(wcStrt2)
> wallClockElapseTime(wcStrt, "load1", 0)
>
> delete(field_ts)
>
> wcStrt = systemfunc("date")
> print(wcStrt)
> field_ts = cdf_file->$struc_pars$(ts:te,1,ys:ye,xs:xe)
> printVarSummary(field_ts)
> wcStrt2 = systemfunc("date")
> print(wcStrt2)
> wallClockElapseTime(wcStrt, "load2", 0)
>
>
> The resulting messages are:
>
> (0) Tue Oct 19 16:55:01 CEST 2010
> (0) Tue Oct 19 16:56:36 CEST 2010
> (0) wallClockElapseTime: something wrong: no printed value
> (0) Tue Oct 19 16:56:36 CEST 2010
> Variable: field_ts
> Type: float
> Total Size: 47239200 bytes
> 11809800 values
> Number of Dimensions: 3
> Dimensions and sizes: [time | 1800] x [y | 81] x [x | 81]
> Coordinates:
> time: [1801..3600]
> y: [ 2.5..1202.5]
> x: [ 2.5..1202.5]
> Number Of Attributes: 3
> zu_3d : -2.5
> long_name : pt
> units : K
> (0) Tue Oct 19 16:57:53 CEST 2010
> (0) wallClockElapseTime: something wrong: no printed value
> (0) Tue Oct 19 16:58:45 CEST 2010
> Variable: field_ts
> Type: float
> Total Size: 47239200 bytes
> 11809800 values
> Number of Dimensions: 3
> Dimensions and sizes: [time | 1800] x [y | 81] x [x | 81]
> Coordinates:
> time: [1801..3600]
> y: [ 2.5..1202.5]
> x: [ 2.5..1202.5]
> Number Of Attributes: 3
> units : K
> long_name : pt
> zu_3d : 7.5
> (0) Tue Oct 19 16:58:50 CEST 2010
> (0) wallClockElapseTime: something wrong: no printed value
>
>
> In this case, "addfile" was comparably fast ~1.5min. Nevertheless, this is not
> an adaquate time period for reading the metadata. A "ncdump -h" command from
> the same machine does not take more than a split second. As you can also see,
> loading of the data is faster (1.17min and 5s).
>
> I talked to the supercomputer support, but they said, that there would be no
> file system problems at the moment. This is why I thought, it might be due to
> some NCL problem. But if the "addfile" command only reads metadata, I'm at a
> loss.
>
> Regards,
> Björn
>
>
>> I am sure one of the NCL core developers will respond
>> later today.
>>
>> --
>> I ran a quick and dirty test on a ** 17.9 GB ** netCDF file.
>> Test script attached. I ran this several times on a multi-user
>> system.
>>
>> %> uname -a
>> Linux tramhill.cgd.ucar.edu 2.6.18-194.11.4.el5 #1 SMP Tue Sep 21
>> 05:04:09 EDT 2010 x86_64 x86_64 x86_64 GNU/Linux
>>
>>
>>
>> The f = addfile("..", "r") was essentially 'instantaneous'.
>>
>> NOTE: *no* data is read by "addfile". This creates a
>> reference (pointer) to the file.
>>
>> The taux=f->TAUX ; (1,2400,3600) was 'instantaneous'
>>
>> The temp=f->TEMP ; (1,42,2400,3600) took 2, 2, 3, 3 seconds
>>
>> Several things could affect your data input:
>> (1)
>> Are you on a multi-user system? When NCL is allocating memory
>> for the input array, are other users also 'competing' for memory?
>>
>> (2)
>> Is the data file on a local file system or on a (say) nfs
>> mounted file system? The later case could affect the input
>> data stream significantly. Sometime ago I ran some tests
>> on an nfs mounted file system. About midnight, there was
>> no timing difference when importing data from a locally
>> mounted file and the nfs mounted file. However, in the
>> middle of the day, the timings were very different.
>>
>> ===
>> Ultimately, (almost) all tools (NCL, IDL, Matlab, NCO, CDO,...)
>> that read netCDF are using the standard Unidata software.
>>
>> Cheers!
>>
>> On 10/19/10 4:06 AM, Bjoern Maronga wrote:
>>> Hello,
>>>
>>> I have a problem using the NCL command "addfile" for loading NetCDF
>>> files, that have a size of at about 4 gb (or larger). It takes about 5
>>> minutes to perform this command (I checked the better performance for
>>> smaller datasets). In contrary, the following commands for loading data
>>> into arrays is fast.
>>>
>>> For me it looks like "addfile" actually loads the whole amount of data
>>> into the memory, even though only a small part will be loaded into arrays
>>> afterwards. From my point of view, addfile only needs to read the
>>> metadata, kind of a "ncdump" command, which takes for these files not
>>> more time than for smaller files. I am very surprised by this finding and
>>> I'm wondering, if this makes sense at all. I'm working with NCL for maybe
>>> two years now and never noticed this behavior before. Has anything
>>> changed with the addfile command? Currently, I am using version 5.2.0.
>>>
>>> Best regards,
>>> Björn Maronga
>
>
>
_______________________________________________
ncl-talk mailing list
List instructions, subscriber options, unsubscribe:
http://mailman.ucar.edu/mailman/listinfo/ncl-talk
Received on Tue Oct 19 09:27:34 2010

This archive was generated by hypermail 2.1.8 : Tue Oct 19 2010 - 14:38:00 MDT