Re: take long time to load a matrix using readAsciiTable

From: Mary Haley <haley_at_nyahnyahspammersnyahnyah>
Date: Wed Nov 17 2010 - 14:59:48 MST

[Xiaoming provided me with the file offline, which has 258951 rows and 100 columns of data.]

I looked at readAsciiTable to see why it's taking so long, and the issue is that in order to parse the data into rows and columns and skip the first line, readAsciiTable has to read in the data as a bunch of strings first. This operation alone is what's taking so long, and this is actually done twice (once for the header and once for the values).

My recommendation is to stay away from readAsciiTable if you are trying to read a file with thousands of lines of data.

Instead, if you can make some intelligent guesses about what will be in your header lines, then you can simply use "asciiread" to read the data in as floats, and toss away the values you don't need.

In your case, your header line has two values in it that asciiread will treat as real float values, so these have to be tossed.

With this information, here's a script that reads your data in about 9.6 wall clock seconds:

load "$NCARG_ROOT/lib/ncarg/nclscripts/csm/contributed.ncl"

;---Read the data into a giant 1D array of floats.
data = asciiread("AerosoldataCombine.txt",-1,"float")

;
; Count the number of values. The first two float values are
; part of header, so don't count these.
;
nvals = dimsizes(data(2:))
print("nvals = " + nvals)

;---Get # of rows and columns, so we can generate a 2D array.
ncols = 112-13+1
nrows = nvals/ncols
print("nrows (" + nrows + ") x ncols (" + ncols + ") = " + (nrows*ncols))

;---Convert to 2D
data2d = onedtond(data(2:),(/nrows,ncols/))
print("min/max data2d = " + min(data2d) + "/" + max(data2d))

I'll update the readAsciiTable/readAsciiHeader documentation to indicate that they can be slow for data files with thousands of lines.

--Mary

On Nov 17, 2010, at 9:31 AM, xiaoming Hu wrote:

> Hello
>
> readAsciiTable took very long time to load a 130M matrix.
> See my script
> "
> Aerosol_contain = readAsciiTable("AerosoldataCombine.txt",112-13+1,"float",1)
> print("finish read")
> "
>
> Any way to speed up the loading ?
>
> Thanks
>
> Xiaoming
>
> _______________________________________________
> ncl-talk mailing list
> List instructions, subscriber options, unsubscribe:
> http://mailman.ucar.edu/mailman/listinfo/ncl-talk

_______________________________________________
ncl-talk mailing list
List instructions, subscriber options, unsubscribe:
http://mailman.ucar.edu/mailman/listinfo/ncl-talk
Received on Wed Nov 17 14:59:58 2010

This archive was generated by hypermail 2.1.8 : Fri Nov 19 2010 - 11:51:06 MST