Re: ncl_convert2nc and tmp file problem

From: David Brown <dbrown_at_nyahnyahspammersnyahnyah>
Date: Mon Mar 05 2012 - 17:38:31 MST

Hi David,

We ran across this same problem just last week and I agree that the problem is almost certainly related to the name of the temp file not being "unique enough". It looks like the soiution is the Unix 'mktemp' command. If you want to try an immediate fix, you could look into the ncl_convert2nc script at around line 860 (if you are using 6.0.0) or a bit further down in the latest development version, and change the lines:

set tmp_nclf = "$tmpdir/tmp$$.ncl"
/bin/rm $tmp_nclf >& /dev/null
cat << 'EOF_NCL' >! $tmp_nclf

to

set tmpfile = `mktemp tmp$$-XXXXXXXX`
set tmp_nclf = "$tmpdir/$tmpfile.ncl"
# echo $tmp_nclf # optional line -- uncomment for debugging file name.
/bin/rm $tmp_nclf >& /dev/null
cat << 'EOF_NCL' >! $tmp_nclf

Each 'X' in the argument to mktemp gets converted to a random letter or digit. The '$$' is the process id of the running process: in this case ncl_convert2nc.
 
I just checked in essentially this same code.
 -dave

On Mar 5, 2012, at 2:07 PM, David B. Reusch wrote:

> I seem to be seeing a problem with ncl_convert2nc where a
> script-generated tmp file can't be opened.
>
> The big picture:
> I'm running ncl_convert "in parallel" on bluefire at NCAR by submitting
> an LSF job with 32 calls of a script that contains a call to
> ncl_convert2nc within it. Each call of my script runs in the
> background, so there may be up to 32 ncl_convert2nc's running
> simultaneously on the execution node (each doing a different input
> file). At the same time, I may have up to 12 of these LSF jobs running
> at once (I'm trying to process some 3800+ files, so I end up with lots
> of jobs). It does not look like more than one LSF job is running on the
> same node at the same time (in case that might be contributing to the
> problem).
>
> The problem:
> I've been using this approach for less than a week, but out of 6 job
> sessions (each 36-40 of my scripts, each with 32 scripts running
> ncl_convert2nc...), I've seen at least one failure with a message like
> "fatal:Could not open (/glade/scratch/dbr/ncl246414.ncl)". I then have
> to figure out which file didn't get converted and try it again. The
> rerun jobs, with just one or a few datasets being processed, have worked
> fine.
>
> This looks like maybe the tmp file name being used by ncl_convert2nc
> isn't quite unique enough? That would explain the error if two
> instances are getting the same name and the file gets deleted by one
> while the other still needs it.
>
> This isn't necessarily the prettiest way to do all this file processing,
> but it's getting the job done for me so I'd like to get it working
> 100%. Is there anything I can do to try to debug the ncl_convert2nc
> processing and confirm/deny my suspicions about the file name race
> condition?
>
> Thanks,
> Dave
>
> --
> Associate Research Professor of Climatology
> Dept of Earth and Environmental Science
> MSEC 304; 801 Leroy Place
> New Mexico Tech
> Socorro, NM 87801
>
> _______________________________________________
> ncl-talk mailing list
> List instructions, subscriber options, unsubscribe:
> http://mailman.ucar.edu/mailman/listinfo/ncl-talk

_______________________________________________
ncl-talk mailing list
List instructions, subscriber options, unsubscribe:
http://mailman.ucar.edu/mailman/listinfo/ncl-talk
Received on Mon Mar 5 14:38:41 2012

This archive was generated by hypermail 2.1.8 : Tue Mar 13 2012 - 14:00:14 MDT