Re: Memory leak with if statement and logical array

From: David Ian Brown <dbrown_at_nyahnyahspammersnyahnyah>
Date: Thu, 26 Apr 2007 18:16:20 -0600

Hi Susan,

OK, I would like to add that I think that even if NCL didn't have the
problem I described
with strings, I think you could greatly improve the performance of your
application
by eliminating the inner loop. It's a fundamental precept of
programming with
an array-based scripting language like NCL that array-based operations
usually give
orders of magnitude better performance compared to operating on
individual
elements in a nested loop.

As a demo, I have recoded the modified version of your script to write
to a temporary
NetCDF file as I suggested. The second part of the script writes out
the ".dat" files (one for
each grid point) just before it would call your wind_mdb code (if it
was available).

The original version of the modified script, which runs for one year's
worth of data,
  took about 1/2 hour to run (and didn't actually write out the .dat
files). The new version,
which actually does write the .dat files (using dummy data), runs in
less than 2 minutes.
There could be a lot more bells and whistles, but this gives the basic
idea. Hopefully it will help.
  -dave

On Apr 26, 2007, at 2:38 PM, O'Neill, Susan - Portland, OR wrote:

> Hi Dave,
>
> Thank you for a great explanation of what is going on! I'll look at
> implementing your netCDF intermediate file suggestion and meanwhile am
> also splitting my processing into smaller chucks and scripting to merge
> things back together.
>
> Thanks!
>
> Susan
>
>
> --------------------------------------------
> Susan M. O'Neill, Ph.D.
> Air Quality Engineer
> USDA Natural Resources Conservation Service
> Air Quality and Atmospheric Change Team
> 1201 NE Lloyd Blvd., Suite 1000
> Portland, Oregon 97232-1202
> 503-273-2438 (work)
> 503-273-2401 (fax)
> susan.oneill_at_por.usda.gov
> --------------------------------------------
> -----Original Message-----
> From: David Ian Brown [mailto:dbrown_at_ucar.edu]
> Sent: Wednesday, April 25, 2007 5:58 PM
> To: O'Neill, Susan - Portland, OR
> Cc: Dave Allured; ncl-talk_at_ucar.edu
> Subject: Re: Memory leak with if statement and logical array
>
> Susan, Dave, et al.,
>
> Here's what we have found related to problems of "memory leaks" that
> have been
> reported recently.
>
> The good news is that the problem Dave Allured reported has been
> solved. There
> was indeed a memory leak that occurred when using a logical variable as
> opposed
> to a logical expression in an if test, i.e.:
> if (logical_var) then
> ...
> as opposed to
> if (var .eq. something) then
> ...
>
> The fix will be in the next version of NCL, still scheduled for release
> by the end of the
> month.
>
> The bad news is that Susan O'Neill's problem is not something that can
> be fixed
> easily, given some design decisions that were made very early on in the
> development of NCL.
>
> It seems that the problem boils down to the fact that NCL permanently
> (by design) stores every unique
> string it encounters in a hash table and assigns the string a unique
> integer id, which
> is passed around instead a pointer reference to the string itself. This
> simplifies programming for
> tasks such as string comparison, and in the usual case for NCL, has
> worked pretty well.
>
> Unfortunately Susan's script (extractFlx_Montana.ncl) is a perfect
> illustration of the drawbacks of this approach. There is
> a inner loop that is executed 2838240 times for each year of data.
> Inside the loop strings are built from small components.
> The built-up strings tend to share a lot of characters at the beginning
> before there is a character that distinguishes them.
> While the memory used might be a problem on a small system, it is not
> all that large compared to the memory required to
> hold the large numerical arrays that are commonly processed using NCL.
> On my 1GB Mac laptop with a version
> of extractFlx_Montana.ncl modified to run for a year with file I/O and
> "system" calls removed (all the real 'work' of the script),
> I found that the virtual memory required grew to just over 100 MB. On
> this system the slow down came mainly from the fact
> that as the hash table grew it took more and more processing time to
> distinguish each string as unique or not. Using
> performance monitoring tools I found that while when the program first
> started it was spending about 25% of its time locating
> strings in the hash table. That had grown to 85% of the time during
> the 12th month of processing.
>
> We will need to give some thought to what we can do, if anything, to
> fix this problem.
>
> In the mean time, Susan, it might make sense to eliminate the inner
> loop by creating an intermediate file in NetCDF
> format that contains the all timesteps and grid points for the period
> you want to analyze. Given such a file it would be very easy
> to run your wind_mdb tool at each gridpoint. I think that approach
> could make your code much more scalable.
> -dave
>
> On Apr 24, 2007, at 10:31 AM, O'Neill, Susan - Portland, OR wrote:
>
>> Hi Dave/everyone,
>>
>> I'm running NCL version 4.2.0.a034 (released 9/26/2006) on a small
>> LINUX
>> box with redhat ES.
>>
>> Thanks!
>>
>> Susan
>>
>>
>> -----Original Message-----
>> From: ncl-talk-bounces_at_ucar.edu [mailto:ncl-talk-bounces_at_ucar.edu] On
>> Behalf Of Dave Allured
>> Sent: Monday, April 23, 2007 6:14 PM
>> To: ncl-talk_at_ucar.edu
>> Subject: Re: Memory leak with if statement and logical
> array
>>
>> All,
>>
>> Attached is a modified version of Susan's tst_memLeak script. I
>> deactivated all statements involving file I/O, leaving only simple
>> NCL statements and calls to NCL built-in functions. I also
>> commented out all if statements.
>>
>> This version t3.ncl also shows a memory leak on Mac OS, though not
>> as severe as the one Susan describes. However, this one still has a
>> severe problem, running slower and slower with each iteration, which
>> is a killer for real processing applications.
>>
>> My original post was about a memory leak with if statements, so this
>> is probably something different.
>>
>> Susan, what kind of system are you running? Also, which NCL
>> version? It would be helpful to know whether these leaks occur on
>> platforms other than Mac OS.
>>
>> --Dave A.
>> CU/CIRES Climate Diagnostics Center (CDC)
>> NOAA/ESRL/PSD, Climate Analysis Branch (CAB)
>>
>> O'Neill, Susan - Portland, OR wrote:
>>>
>>> Hi Dave & everyone,
>>>
>>> I too have been struggling with a memory leak and was very excited to
>>> see your message. Unfortunately, it did not fix things for me.
>>> Attached is a very simple NCL script that loops through 3 yrs of flx
>>> files (tst_memLeak.ncl). I removed all program logic except for loop
>>> controls, system calls to un-tar & clean-up files, and opening the
> flx
>>> files. As the program runs it steadily consumes more and more
> memory,
>>> even after adding the explicit "True" to my logical if statements.
> It
>>> starts out using approx 2% of memory and by the time it's run 24
>> months
>>> it's using 27% of memory.
>>>
>>> When I have actual code in the script (see attached
>>> extractFlx_Montana.ncl), by the time 2 yrs have run 85% of memory is
>>> being used and it takes hours to process through one flx file.
>>>
>>> I can work around this by creating a wrapper script that only runs
> NCL
>> a
>>> month at a time using ENV variables to set loop controls, but would
>>> really like to find a solution to this because I may want more data
> in
>>> memory to do graphing or other data manipulations in the future.
>>>
>>> Any insights/help/comments would be very appreciated! I am new to
> NCL
>>> so could quite possibly be missing something too, thank you!
>>>
>>> Susan
>>
>> _______________________________________________
>> ncl-talk mailing list
>> ncl-talk_at_ucar.edu
>> http://mailman.ucar.edu/mailman/listinfo/ncl-talk
>
>

_______________________________________________
ncl-talk mailing list
ncl-talk_at_ucar.edu
http://mailman.ucar.edu/mailman/listinfo/ncl-talk

Received on Thu Apr 26 2007 - 18:16:20 MDT

This archive was generated by hypermail 2.2.0 : Tue May 01 2007 - 08:38:54 MDT