RE: [ncl-talk] Memory leak with if statement and logical array

From: O'Neill, Susan - Portland, OR <susan.oneill_at_nyahnyahspammersnyahnyah>
Date: Thu, 26 Apr 2007 14:38:30 -0600

Hi Dave,

Thank you for a great explanation of what is going on! I'll look at
implementing your netCDF intermediate file suggestion and meanwhile am
also splitting my processing into smaller chucks and scripting to merge
things back together.

Thanks!

        Susan

--------------------------------------------
Susan M. O'Neill, Ph.D.
Air Quality Engineer
USDA Natural Resources Conservation Service
Air Quality and Atmospheric Change Team
1201 NE Lloyd Blvd., Suite 1000
Portland, Oregon 97232-1202
503-273-2438 (work)
503-273-2401 (fax)
susan.oneill_at_por.usda.gov
--------------------------------------------
-----Original Message-----
From: David Ian Brown [mailto:dbrown_at_ucar.edu]
Sent: Wednesday, April 25, 2007 5:58 PM
To: O'Neill, Susan - Portland, OR
Cc: Dave Allured; ncl-talk_at_ucar.edu
Subject: Re: Memory leak with if statement and logical array

Susan, Dave, et al.,

Here's what we have found related to problems of "memory leaks" that
have been
reported recently.

The good news is that the problem Dave Allured reported has been
solved. There
was indeed a memory leak that occurred when using a logical variable as
opposed
to a logical expression in an if test, i.e.:
     if (logical_var) then
        ...
as opposed to
     if (var .eq. something) then
        ...

The fix will be in the next version of NCL, still scheduled for release
by the end of the
month.

The bad news is that Susan O'Neill's problem is not something that can
be fixed
easily, given some design decisions that were made very early on in the
development of NCL.

It seems that the problem boils down to the fact that NCL permanently
(by design) stores every unique
string it encounters in a hash table and assigns the string a unique
integer id, which
is passed around instead a pointer reference to the string itself. This
simplifies programming for
tasks such as string comparison, and in the usual case for NCL, has
worked pretty well.

Unfortunately Susan's script (extractFlx_Montana.ncl) is a perfect
illustration of the drawbacks of this approach. There is
a inner loop that is executed 2838240 times for each year of data.
Inside the loop strings are built from small components.
The built-up strings tend to share a lot of characters at the beginning
before there is a character that distinguishes them.
While the memory used might be a problem on a small system, it is not
all that large compared to the memory required to
hold the large numerical arrays that are commonly processed using NCL.
On my 1GB Mac laptop with a version
of extractFlx_Montana.ncl modified to run for a year with file I/O and
"system" calls removed (all the real 'work' of the script),
I found that the virtual memory required grew to just over 100 MB. On
this system the slow down came mainly from the fact
that as the hash table grew it took more and more processing time to
distinguish each string as unique or not. Using
performance monitoring tools I found that while when the program first
started it was spending about 25% of its time locating
strings in the hash table. That had grown to 85% of the time during
the 12th month of processing.

We will need to give some thought to what we can do, if anything, to
fix this problem.

In the mean time, Susan, it might make sense to eliminate the inner
loop by creating an intermediate file in NetCDF
format that contains the all timesteps and grid points for the period
you want to analyze. Given such a file it would be very easy
to run your wind_mdb tool at each gridpoint. I think that approach
could make your code much more scalable.
  -dave

On Apr 24, 2007, at 10:31 AM, O'Neill, Susan - Portland, OR wrote:

> Hi Dave/everyone,
>
> I'm running NCL version 4.2.0.a034 (released 9/26/2006) on a small
> LINUX
> box with redhat ES.
>
> Thanks!
>
> Susan
>
>
> -----Original Message-----
> From: ncl-talk-bounces_at_ucar.edu [mailto:ncl-talk-bounces_at_ucar.edu] On
> Behalf Of Dave Allured
> Sent: Monday, April 23, 2007 6:14 PM
> To: ncl-talk_at_ucar.edu
> Subject: Re: Memory leak with if statement and logical
array
>
> All,
>
> Attached is a modified version of Susan's tst_memLeak script. I
> deactivated all statements involving file I/O, leaving only simple
> NCL statements and calls to NCL built-in functions. I also
> commented out all if statements.
>
> This version t3.ncl also shows a memory leak on Mac OS, though not
> as severe as the one Susan describes. However, this one still has a
> severe problem, running slower and slower with each iteration, which
> is a killer for real processing applications.
>
> My original post was about a memory leak with if statements, so this
> is probably something different.
>
> Susan, what kind of system are you running? Also, which NCL
> version? It would be helpful to know whether these leaks occur on
> platforms other than Mac OS.
>
> --Dave A.
> CU/CIRES Climate Diagnostics Center (CDC)
> NOAA/ESRL/PSD, Climate Analysis Branch (CAB)
>
> O'Neill, Susan - Portland, OR wrote:
>>
>> Hi Dave & everyone,
>>
>> I too have been struggling with a memory leak and was very excited to
>> see your message. Unfortunately, it did not fix things for me.
>> Attached is a very simple NCL script that loops through 3 yrs of flx
>> files (tst_memLeak.ncl). I removed all program logic except for loop
>> controls, system calls to un-tar & clean-up files, and opening the
flx
>> files. As the program runs it steadily consumes more and more
memory,
>> even after adding the explicit "True" to my logical if statements.
It
>> starts out using approx 2% of memory and by the time it's run 24
> months
>> it's using 27% of memory.
>>
>> When I have actual code in the script (see attached
>> extractFlx_Montana.ncl), by the time 2 yrs have run 85% of memory is
>> being used and it takes hours to process through one flx file.
>>
>> I can work around this by creating a wrapper script that only runs
NCL
> a
>> month at a time using ENV variables to set loop controls, but would
>> really like to find a solution to this because I may want more data
in
>> memory to do graphing or other data manipulations in the future.
>>
>> Any insights/help/comments would be very appreciated! I am new to
NCL
>> so could quite possibly be missing something too, thank you!
>>
>> Susan
>
> _______________________________________________
> ncl-talk mailing list
> ncl-talk_at_ucar.edu
> http://mailman.ucar.edu/mailman/listinfo/ncl-talk

_______________________________________________
ncl-talk mailing list
ncl-talk_at_ucar.edu
http://mailman.ucar.edu/mailman/listinfo/ncl-talk
Received on Thu Apr 26 2007 - 14:38:30 MDT

This archive was generated by hypermail 2.2.0 : Tue May 01 2007 - 08:38:54 MDT