Re: Memory leak with if statement and logical array

From: Dave Allured <dave.allured_at_nyahnyahspammersnyahnyah>
Date: Thu, 26 Apr 2007 15:49:25 -0600

Dave,

Thank you very much for taking care of the problem with the if
statements. I look forward to the next release.

Would it be possible to make NCL discard strings that are used only
as intermediate expressions, not named variables? If so, then this
could be a workaround in some cases. Intermediate expressions could
be used to hide iterative string values such as file names. For
example:

   f = addfile (part1 + cyr + cmon + cday + chour + "_01.nc")

cyr might have only 100 unique values, cmon only 12, cday only 31,
and chour only 24, thereby reducing the spewage of unique strings
from 100 * 12 * 31 * 24 to 100 + 12 + 31 + 24. That's 892800 vs.
167, a huge improvement for the hash table. What do you think?

--Dave A.
CU/CIRES Climate Diagnostics Center (CDC)
NOAA/ESRL/PSD, Climate Analysis Branch (CAB)

David Ian Brown wrote:
> Susan, Dave, et al.,
>
> Here's what we have found related to problems of "memory leaks" that
> have been
> reported recently.
>
> The good news is that the problem Dave Allured reported has been
> solved. There
> was indeed a memory leak that occurred when using a logical variable as
> opposed
> to a logical expression in an if test, i.e.:
> if (logical_var) then
> ...
> as opposed to
> if (var .eq. something) then
> ...
>
> The fix will be in the next version of NCL, still scheduled for release
> by the end of the
> month.
>
> The bad news is that Susan O'Neill's problem is not something that can
> be fixed
> easily, given some design decisions that were made very early on in the
> development of NCL.
>
> It seems that the problem boils down to the fact that NCL permanently
> (by design) stores every unique
> string it encounters in a hash table and assigns the string a unique
> integer id, which
> is passed around instead a pointer reference to the string itself. This
> simplifies programming for
> tasks such as string comparison, and in the usual case for NCL, has
> worked pretty well.
>
> Unfortunately Susan's script (extractFlx_Montana.ncl) is a perfect
> illustration of the drawbacks of this approach. There is
> a inner loop that is executed 2838240 times for each year of data.
> Inside the loop strings are built from small components.
> The built-up strings tend to share a lot of characters at the beginning
> before there is a character that distinguishes them.
> While the memory used might be a problem on a small system, it is not
> all that large compared to the memory required to
> hold the large numerical arrays that are commonly processed using NCL.
> On my 1GB Mac laptop with a version
> of extractFlx_Montana.ncl modified to run for a year with file I/O and
> "system" calls removed (all the real 'work' of the script),
> I found that the virtual memory required grew to just over 100 MB. On
> this system the slow down came mainly from the fact
> that as the hash table grew it took more and more processing time to
> distinguish each string as unique or not. Using
> performance monitoring tools I found that while when the program first
> started it was spending about 25% of its time locating
> strings in the hash table. That had grown to 85% of the time during the
> 12th month of processing.
>
> We will need to give some thought to what we can do, if anything, to fix
> this problem.
>
> In the mean time, Susan, it might make sense to eliminate the inner loop
> by creating an intermediate file in NetCDF
> format that contains the all timesteps and grid points for the period
> you want to analyze. Given such a file it would be very easy
> to run your wind_mdb tool at each gridpoint. I think that approach
> could make your code much more scalable.
> -dave
_______________________________________________
ncl-talk mailing list
ncl-talk_at_ucar.edu
http://mailman.ucar.edu/mailman/listinfo/ncl-talk
Received on Thu Apr 26 2007 - 15:49:25 MDT

This archive was generated by hypermail 2.2.0 : Tue May 01 2007 - 08:38:54 MDT