Re: Can't Create Child Processes

From: David Brown <dbrown_at_nyahnyahspammersnyahnyah>
Date: Fri Nov 12 2010 - 16:30:38 MST

Hi Bill,

The error message: fatal:systemfunc: cannot create child process:[errno=12] comes directly from NCL's systemfunc code. It occurs because the Unix 'fork' call fails to create a child process. A long time ago (2003 -- 4.2.0.something) NCL had a bug in systemfunc that caused it to use up file descriptors but it does not do that any more. So the message does not indicate you have run out of file descriptors.

Error numbers are somewhat system specific but errno 12 is usually defined as:

ENOMEM 12 /* Out of memory */

So my best guess, without having the data files and being able to run the script myself, is that memory is the problem.

You say the files are large. Presumably that implies that the allocations for all your output variables:

 et = new( (/times, south_north, west_east /), float )
   pet = new( (/times, south_north, west_east /), float )
   cape = new( (/times, south_north, west_east /), float )
   cin = new( (/times, south_north, west_east /), float )
   t2m = new( (/times, south_north, west_east /), float )
   td2m = new( (/times, south_north, west_east /), float )
   z850mb = new( (/times, south_north, west_east /), float )
   z500mb = new( (/times, south_north, west_east /), float )
   z700mb = new( (/times, south_north, west_east /), float )
   t850mb = new( (/times, south_north, west_east /), float )
   td850mb = new( (/times, south_north, west_east /), float )
   tswc = new( (/times, south_north, west_east /), float )
   sfroff = new( (/times, south_north, west_east /), float )
   udroff = new( (/times, south_north, west_east /), float )
   mslp = new( (/times, south_north, west_east /), float )

occupy quite a bit of memory. I think it would be better if you could avoid the in-core allocation of the complete contents of all these variables. You could reorganize your script such that you create your output file with all its metadata and variables defined prior to the loop over the input data files. Then in the input loop you just read in the single timestep for each variable and assign it directly to the file variable. That way you would never need more than a single timestep worth of memory for the data. And you still preserve the your method of unzipping and re-zipping each input file as it is processed.

As an example:

do i = 0, nt-1
....
t3d = wrf_user_getvar(f, "tk", 0)
p3d = wrf_user_getvar(f, "p", 0)
t850mb = wrf_user_intrp3d( t3d,p3d,"h",85000.,0.,False)

o->t850mb(m,:,:) = (/ t850mb /)
....

end do

Hope this helps.
 -dave

On Nov 11, 2010, at 2:15 PM, Capehart, William J wrote:

> Hi David
>
> Here is the code,
>
> The file is still giving me the child process error on multiple rigs...
>
> Any ideas? I am not sure why it's choosing now to fail since he's worked
> in the past.
>
> Bill
>
>
> On 11/10/10 12:32 MST, "David Brown" <dbrown@ucar.edu> wrote:
>
>> Hi Bill,
>> Can you send me a copy of the script you are using? And just to be clear,
>> you are still getting an error about too many open files?
>> -dave
>>
>>
>> On Nov 10, 2010, at 8:47 AM, Capehart, William J wrote:
>>
>>> Ok after a haitus on other things, came back to this problem and it's
>>> still there.
>>>
>>> I have added these lines up top after the begin...
>>>
>>> setfileoption("nc","Format", "LargeFile")
>>> setfileoption("nc","SuppressClose",False)
>>>
>>>
>>>
>>> After I'm done with each file ("f" via addfile) I "delete" them
>>>
>>> delete(f)
>>>
>>> I even rebooted the rig on which it's running (ulimit-n = 1024)
>>>
>>> And I am still getting the error.
>>>
>>> Am I doing the suppress function wrong?
>>>
>>> Bill
>>>
>>>
>>>
>>>
>>> Hi Bill,
>>> There is no explicit close function corresponding to addfiles or
>>> addfile. However, if you are using NetCDF data, by default, each file
>>> stays open (and thus consumes a file descriptor) until you delete or
>>> reassign the variable that refers to it. If using addfiles the same
>>> applies to each file in the list of files. Deleting or reassigning the
>>> list variable closes each file in the list. You can change this
>>> behavior and make NCL open and close the file for each access by
>>> setting the setfileoption "SuppressClose" to False. See
>>> http://www.ncl.ucar.edu/Document/Functions/Built-in/setfileoption.shtml.
>>> This usually has little performance impact when files are opened for
>>> reading only.
>>> -dave
>>>
>>>
>>>
>>>
>>> On 9/27/10 14:32 MDT, "Capehart, William J" <William.Capehart@sdsmt.edu>
>>> wrote:
>>>
>>>> Hi Daryl:
>>>>
>>>> My ulimit -n is 256.
>>>>
>>>> I AM working with netcdf data but using addfile to open them. But the
>>>> system and systemfunc failures are happening with gzip and gunzip.
>>>>
>>>> How exactly do you "close" addfiles? And how will this help with the
>>>> zipping and unzipping?
>>>>
>>>> Thanks Much
>>>> Bill
>>>>
>>>>
>>>>
>>>> On 9/27/10 14:25 MDT, "Daryl Harzmann" <akrherz@iastate.edu> wrote:
>>>>
>>>>> On Mon, 27 Sep 2010, Capehart, William J wrote:
>>>>>
>>>>>> Iım doing a goodly amount of file management in one of my
>>>>>> post-processing scripts where I am gunzipping and re-gzipping data
>>>>>> before and behind me as I march through a lot of model output.
>>>>>>
>>>>>> Recently, Iıve been hitting these errors when I use the function
>>>>>> ³systemfunc²:
>>>>>>
>>>>>> fatal:systemfunc: cannot create child process:[errno=12]
>>>>>>
>>>>>> Ideas? Note that the system() subroutine yields no such error. BUT.
>>>>>> Instead, it fails to execute the requested shell command and keeps
>>>>>> going
>>>>>> drunkenly onward until it tries to open a file that hasnıt been
>>>>>> gunziped
>>>>>> or similar crash.
>>>>>
>>>>> This sounds like you are running out of file descriptors. You are
>>>>> probably
>>>>> hitting 1024 open files (ulimit -n will display the limit). Are you
>>>>> certain you are closing files and cleaning up after yourself as you
>>>>> march
>>>>> along? Is this NetCDF data?
>>>>>
>>>>> daryl
>>>>
>>>
>>> _______________________________________________
>>> ncl-talk mailing list
>>> List instructions, subscriber options, unsubscribe:
>>> http://mailman.ucar.edu/mailman/listinfo/ncl-talk
>>
>
> <combine2.ncl>

_______________________________________________
ncl-talk mailing list
List instructions, subscriber options, unsubscribe:
http://mailman.ucar.edu/mailman/listinfo/ncl-talk
Received on Fri Nov 12 16:30:46 2010

This archive was generated by hypermail 2.1.8 : Mon Nov 15 2010 - 08:55:22 MST