Re: ncl_convert2nc and tmp file problem

From: David Brown <dbrown_at_nyahnyahspammersnyahnyah>
Date: Tue Mar 06 2012 - 13:05:42 MST

Hi David,
I have checked in a new version of ncl_convert2nc that uses the 'date' and 'printf' commands to create a unique name based on the process id + the time in seconds since the Unix epoch as a hexadecimal number. I checked to see that both these commands are available on bluefire.
 -dave

On Mar 6, 2012, at 10:57 AM, David B. Reusch wrote:

> Hi,
>
> The good folks at the CISL Help Desk provided the following for me to test earlier. Not sure it's ideal (e.g., perhaps mkstemp would be a better fn call?) but it seems to be working fine. They said they'd add it to bluefire if it worked for me.
>
> #include int main ( int argc, char* argv[] )
> {
> if( argc == 2 ) {
> printf( " %s \n", mktemp( argv[1]) );
> }
> return 0;
> }
>
> Good catch on the man page!
>
> For what it's worth, I get a version of mktemp from fink on my Macs and I find it in /bin on RH5 and RH6 boxes, so maybe it's just missing on bluefire (though obviously I wouldn't want to rely on that!).
>
> Thanks,
> Dave
>
> David Brown said the following on 3/6/12 10:37 AM:
>>
>> Thanks for the info about mktemp. It will definitely not be a good solution if it is not universally available. And I agree it does seem a little weird that a process id is getting reused in time to cause a problem with your multiple process scripting. I will look into this further.
>> -dave
>>
>> On Mar 5, 2012, at 8:57 PM, David B. Reusch wrote:
>>
>>> Hi Dave,
>>>
>>> Thanks for the quick response. I came across the mktemp option after hitting send so this fits with what I was thinking about. Unfortunately, I seem to not have this on my path (on bluefire) so my locally modified ncl_convert2nc fails. Didn't see any obvious modules to load either, so I'm temporarily stumped on locating it.
>>>
>>> I also came across some cautions about using mktemp while searching for more info (for example, at http://linux.die.net/man/3/mktemp they recommend mkstemp instead).
>>>
>>> While I agree that the evidence all points this way, it still seems odd that the use of the process number (i.e., the $$) in the tmp name could fail. Still, no reason not to make it (more) bulletproof.
>>>
>>> Thanks,
>>> Dave
>>>
>>> David Brown said the following on 3/5/12 5:38 PM:
>>>> Hi David,
>>>>
>>>> We ran across this same problem just last week and I agree that the problem is almost certainly related to the name of the temp file not being "unique enough". It looks like the soiution is the Unix 'mktemp' command. If you want to try an immediate fix, you could look into the ncl_convert2nc script at around line 860 (if you are using 6.0.0) or a bit further down in the latest development version, and change the lines:
>>>>
>>>> set tmp_nclf = "$tmpdir/tmp$$.ncl"
>>>> /bin/rm $tmp_nclf>& /dev/null
>>>> cat<< 'EOF_NCL'>! $tmp_nclf
>>>>
>>>> to
>>>>
>>>> set tmpfile = `mktemp tmp$$-XXXXXXXX`
>>>> set tmp_nclf = "$tmpdir/$tmpfile.ncl"
>>>> # echo $tmp_nclf # optional line -- uncomment for debugging file name.
>>>> /bin/rm $tmp_nclf>& /dev/null
>>>> cat<< 'EOF_NCL'>! $tmp_nclf
>>>>
>>>> Each 'X' in the argument to mktemp gets converted to a random letter or digit. The '$$' is the process id of the running process: in this case ncl_convert2nc.
>>>>
>>>> I just checked in essentially this same code.
>>>> -dave
>>>>
>>>>
>>>> On Mar 5, 2012, at 2:07 PM, David B. Reusch wrote:
>>>>
>>>>> I seem to be seeing a problem with ncl_convert2nc where a
>>>>> script-generated tmp file can't be opened.
>>>>>
>>>>> The big picture:
>>>>> I'm running ncl_convert "in parallel" on bluefire at NCAR by submitting
>>>>> an LSF job with 32 calls of a script that contains a call to
>>>>> ncl_convert2nc within it. Each call of my script runs in the
>>>>> background, so there may be up to 32 ncl_convert2nc's running
>>>>> simultaneously on the execution node (each doing a different input
>>>>> file). At the same time, I may have up to 12 of these LSF jobs running
>>>>> at once (I'm trying to process some 3800+ files, so I end up with lots
>>>>> of jobs). It does not look like more than one LSF job is running on the
>>>>> same node at the same time (in case that might be contributing to the
>>>>> problem).
>>>>>
>>>>> The problem:
>>>>> I've been using this approach for less than a week, but out of 6 job
>>>>> sessions (each 36-40 of my scripts, each with 32 scripts running
>>>>> ncl_convert2nc...), I've seen at least one failure with a message like
>>>>> "fatal:Could not open (/glade/scratch/dbr/ncl246414.ncl)". I then have
>>>>> to figure out which file didn't get converted and try it again. The
>>>>> rerun jobs, with just one or a few datasets being processed, have worked
>>>>> fine.
>>>>>
>>>>> This looks like maybe the tmp file name being used by ncl_convert2nc
>>>>> isn't quite unique enough? That would explain the error if two
>>>>> instances are getting the same name and the file gets deleted by one
>>>>> while the other still needs it.
>>>>>
>>>>> This isn't necessarily the prettiest way to do all this file processing,
>>>>> but it's getting the job done for me so I'd like to get it working
>>>>> 100%. Is there anything I can do to try to debug the ncl_convert2nc
>>>>> processing and confirm/deny my suspicions about the file name race
>>>>> condition?
>>>>>
>>>>> Thanks,
>>>>> Dave
>>>>>
>>>>> --
>>>>> Associate Research Professor of Climatology
>>>>> Dept of Earth and Environmental Science
>>>>> MSEC 304; 801 Leroy Place
>>>>> New Mexico Tech
>>>>> Socorro, NM 87801
>>>>>
>>>>> _______________________________________________
>>>>> ncl-talk mailing list
>>>>> List instructions, subscriber options, unsubscribe:
>>>>> http://mailman.ucar.edu/mailman/listinfo/ncl-talk
>>> --
>>> Associate Research Professor of Climatology
>>> Dept of Earth and Environmental Science
>>> MSEC 304; 801 Leroy Place
>>> New Mexico Tech
>>> Socorro, NM 87801
>>>
>>
>
> --
> Associate Research Professor of Climatology
> Dept of Earth and Environmental Science
> MSEC 304; 801 Leroy Place
> New Mexico Tech
> Socorro, NM 87801

_______________________________________________
ncl-talk mailing list
List instructions, subscriber options, unsubscribe:
http://mailman.ucar.edu/mailman/listinfo/ncl-talk
Received on Tue Mar 6 10:06:06 2012

This archive was generated by hypermail 2.1.8 : Tue Mar 13 2012 - 14:00:14 MDT