Re: Memory leak with if statement and logical array

From: David Ian Brown <dbrown_at_nyahnyahspammersnyahnyah>
Date: Wed, 25 Apr 2007 18:58:00 -0600

Susan, Dave, et al.,

Here's what we have found related to problems of "memory leaks" that
have been
reported recently.

The good news is that the problem Dave Allured reported has been
solved. There
was indeed a memory leak that occurred when using a logical variable as
opposed
to a logical expression in an if test, i.e.:
     if (logical_var) then
        ...
as opposed to
     if (var .eq. something) then
        ...

The fix will be in the next version of NCL, still scheduled for release
by the end of the
month.

The bad news is that Susan O'Neill's problem is not something that can
be fixed
easily, given some design decisions that were made very early on in the
development of NCL.

It seems that the problem boils down to the fact that NCL permanently
(by design) stores every unique
string it encounters in a hash table and assigns the string a unique
integer id, which
is passed around instead a pointer reference to the string itself. This
simplifies programming for
tasks such as string comparison, and in the usual case for NCL, has
worked pretty well.

Unfortunately Susan's script (extractFlx_Montana.ncl) is a perfect
illustration of the drawbacks of this approach. There is
a inner loop that is executed 2838240 times for each year of data.
Inside the loop strings are built from small components.
The built-up strings tend to share a lot of characters at the beginning
before there is a character that distinguishes them.
While the memory used might be a problem on a small system, it is not
all that large compared to the memory required to
hold the large numerical arrays that are commonly processed using NCL.
On my 1GB Mac laptop with a version
of extractFlx_Montana.ncl modified to run for a year with file I/O and
"system" calls removed (all the real 'work' of the script),
I found that the virtual memory required grew to just over 100 MB. On
this system the slow down came mainly from the fact
that as the hash table grew it took more and more processing time to
distinguish each string as unique or not. Using
performance monitoring tools I found that while when the program first
started it was spending about 25% of its time locating
strings in the hash table. That had grown to 85% of the time during
the 12th month of processing.

We will need to give some thought to what we can do, if anything, to
fix this problem.

In the mean time, Susan, it might make sense to eliminate the inner
loop by creating an intermediate file in NetCDF
format that contains the all timesteps and grid points for the period
you want to analyze. Given such a file it would be very easy
to run your wind_mdb tool at each gridpoint. I think that approach
could make your code much more scalable.
  -dave

On Apr 24, 2007, at 10:31 AM, O'Neill, Susan - Portland, OR wrote:

> Hi Dave/everyone,
>
> I'm running NCL version 4.2.0.a034 (released 9/26/2006) on a small
> LINUX
> box with redhat ES.
>
> Thanks!
>
> Susan
>
>
> -----Original Message-----
> From: ncl-talk-bounces_at_ucar.edu [mailto:ncl-talk-bounces_at_ucar.edu] On
> Behalf Of Dave Allured
> Sent: Monday, April 23, 2007 6:14 PM
> To: ncl-talk_at_ucar.edu
> Subject: Re: Memory leak with if statement and logical array
>
> All,
>
> Attached is a modified version of Susan's tst_memLeak script. I
> deactivated all statements involving file I/O, leaving only simple
> NCL statements and calls to NCL built-in functions. I also
> commented out all if statements.
>
> This version t3.ncl also shows a memory leak on Mac OS, though not
> as severe as the one Susan describes. However, this one still has a
> severe problem, running slower and slower with each iteration, which
> is a killer for real processing applications.
>
> My original post was about a memory leak with if statements, so this
> is probably something different.
>
> Susan, what kind of system are you running? Also, which NCL
> version? It would be helpful to know whether these leaks occur on
> platforms other than Mac OS.
>
> --Dave A.
> CU/CIRES Climate Diagnostics Center (CDC)
> NOAA/ESRL/PSD, Climate Analysis Branch (CAB)
>
> O'Neill, Susan - Portland, OR wrote:
>>
>> Hi Dave & everyone,
>>
>> I too have been struggling with a memory leak and was very excited to
>> see your message. Unfortunately, it did not fix things for me.
>> Attached is a very simple NCL script that loops through 3 yrs of flx
>> files (tst_memLeak.ncl). I removed all program logic except for loop
>> controls, system calls to un-tar & clean-up files, and opening the flx
>> files. As the program runs it steadily consumes more and more memory,
>> even after adding the explicit "True" to my logical if statements. It
>> starts out using approx 2% of memory and by the time it's run 24
> months
>> it's using 27% of memory.
>>
>> When I have actual code in the script (see attached
>> extractFlx_Montana.ncl), by the time 2 yrs have run 85% of memory is
>> being used and it takes hours to process through one flx file.
>>
>> I can work around this by creating a wrapper script that only runs NCL
> a
>> month at a time using ENV variables to set loop controls, but would
>> really like to find a solution to this because I may want more data in
>> memory to do graphing or other data manipulations in the future.
>>
>> Any insights/help/comments would be very appreciated! I am new to NCL
>> so could quite possibly be missing something too, thank you!
>>
>> Susan
>
> _______________________________________________
> ncl-talk mailing list
> ncl-talk_at_ucar.edu
> http://mailman.ucar.edu/mailman/listinfo/ncl-talk

_______________________________________________
ncl-talk mailing list
ncl-talk_at_ucar.edu
http://mailman.ucar.edu/mailman/listinfo/ncl-talk
Received on Wed Apr 25 2007 - 18:58:00 MDT

This archive was generated by hypermail 2.2.0 : Thu Apr 26 2007 - 08:55:37 MDT