What's the most performant way to write both dimension and data values
of a datavar to CSV? Why I ask:
I have a large (> 1 GB) netCDF file. I'm only interested in one
datavar in the file, so I used NCL to "prune" it down to one datavar,
and write that "pruned output" to a new netCDF file: see
The pruned datavar is much more tractable: 13 MB, 3.3 Mtuples, see
However I want to further operate on that data in R. While I could
work directly with the netCDF, I find CSV easier to pull into R ...
but my current code for doing that (~100 lines starting @ line 219 of
prune_IOAPI.ncl) is
* very unvectorized: it's a for-loop writing tuples to a 1D vector of
(text) lines. The tuples are values of the pruned datavar
sg_datavar_grid(t,l,r,c), and each line has the form
sprinti("%i", r) + "," + \
sprinti("%i", c) + "," + \
sprinti("%i", l) + "," + \
sprinti("%i", t) + "," + \
sprintf("%f", sg_datavar_grid(t,l,r,c))
* not very performant: on an HPC cluster node (admittedly running in
home, not as a job) writing the 3.3 Mtuples (66 MB) to a string
variable
lines = new( (/ 3293784 /), string)
(i.e., no file I/O, just in-memory) requires 19.22 min.
So I suspect my code can be improved. Is there a better way to do this?
TIA, Tom Roche <Tom_Roche@pobox.com>
_______________________________________________
ncl-talk mailing list
List instructions, subscriber options, unsubscribe:
http://mailman.ucar.edu/mailman/listinfo/ncl-talk
Received on Tue Feb 5 12:08:30 2013
This archive was generated by hypermail 2.1.8 : Wed Feb 13 2013 - 09:25:58 MST