Reading ASCII: Multi-columnar data
Reading data that is all numeric
Data file:
1994 -0.5 -0.1 -2.2 -2.9 -1.7 -1.5 -2.9 -2.9 -3.0 -2.6 1995 -1.0 -0.8 0.4 -1.8 -1.2 -0.4 0.6 -0.1 0.5 -0.5 1996 1.7 -0.2 1.1 1.1 0.2 1.6 1.0 0.7 1.0 0.7Script to read data file:
;--------------------------------------------------------------- begin ; ; Read the ascii file. ; ksoi=asciiread("./soi_ket",(/nyrs,ncol/),"float") time = ksoi(:,0) ; vector containing the years m_soi = ksoi(:,3) ; Southern Oscillation Index for March print(time) ; ; Create coordinate variable time. ; time!0 = "time" time&time = time time@long_name = "years" ; ; Name m_soi dimension and assign CV. ; m_soi!0 = "time" m_soi&time = time end
Reading data with headers that are all numeric
Data file:
1 1900 -32768 -32768 -32768 -32768 -32768 -32768 -32768 -32768 -32768 -32768 -32768 -32768 -32768 -32768 -32768 -32768 -32768 -32768 -32768 -32768 -32768 -32768 -32768 -32768 .... -32768 -32768 -32768 -32768 -32768 -32768 -32768 -32768 -32768 -32768 -32768 -32768 -32768 -32768 -32768 -32768 -32768 -32768 -32768 -32768 -32768 -32768 -32768 -32768 2 1900 -32768 -32768 -32768 -32768 -32768 -32768 -32768 -32768 -32768 -32768 -32768 -32768 -32768 -32768 -32768 -32768 -32768 -32768 -32768 -32768 -32768 -32768 -32768 -32768 ....
Each header consists of a month and a year, followed by 12 columns and 216 rows. The data is actually of dimension size (72,36), longitude x latitude. (The latitude dimension is varying the fastest in the ascii file. In NCL, the rightmost dimension varies the fastest. Therefore, the righmost dimension is latitude when read into NCL.) The missing values are indicated by -32768. Latitude starts at -87.5, longitude starts at -177.5, and the time runs from Jan 1900 -> Dec 2005. The data must be scaled by 0.01.
The following script can be used to read in the file. The key functions that are used are asciiread and onedtond. Two contributed.ncl functions, latGlobeFo and lonGlobeFo, are also used to assign coordinate variables.
The key to the parsing that is done in the double do loops below is that the counter (nStrt) increments each time through by the number of data points + 2, thereby skipping the header of the next timestep.
load "$NCARG_ROOT/lib/ncarg/nclscripts/csm/contributed.ncl"
begin
nmos = 12
yrStrt = 1900
yrLast = 2005
nlat = 36
mlon = 72
nyrs = yrLast-yrStrt+1
ntim = nyrs*nmos
npts = nlat*mlon ; number of pts in each block
fili = "grid_prcp_1900-2006.dat"
data = asciiread (fili, -1, "float") ; read all values
data@_FillValue = -32768
printVarSummary(data)
prcp = new ( (/ntim,mlon,nlat/), "float", data@_FillValue)
time = new ( ntim, "integer", "No_FillValue")
nhd = 2 ; number of values in 'header' [month year]
nStrt = 0
do nt=0,ntim-1
mon = data(nStrt)
year = data(nStrt+1)
time(nt) = year*100 + mon ; yyyymm, used for coordinate variable assignment below
nLast= nStrt+npts+1
prcp(nt,:,:) = onedtond( data(nStrt+nhd:nLast), (/mlon,nlat/))
nStrt = nStrt+npts+nhd
end do
prcp = prcp*0.01 ; apply scaling factor
time!0 = "time"
time@units = "yyyymm"
lat = latGlobeFo (nlat, "lat", "latitude", "degrees_north") ; in contributed.ncl
lon = lonGlobeFo (mlon, "lon", "longitude","degrees_east") ; in contributed.ncl
lon = lon - 180
prcp!0 = "time"
prcp!1 = "lon"
prcp!2 = "lat"
prcp&time = time
prcp&lon = lon
prcp&lat = lat
prcp@long_name = "PRCP"
prcp2 = prcp(time|:,lat|:,lon|:) ; reorder dimensions to time x lat x lon
delete(prcp)
end
Reading data that is a mix of alpha and numeric characters
Data file:
ID LAT LON PW LMNO 36.69 -97.48 47.10 LTHM 39.58 -94.17 40.40 MEDF 36.79 -97.75 46.70 NDS1 37.30 -95.60 39.90 ...
Note that the above file has numbers in the first column of data. We need to parse out this first column so these numeric values don't get mixed in with our real data.
Script to read data file:
data = asciiread("./pw.dat", -1, "string") ; -1 means read all rows. cdata = stringtochar(data) ; ; The first row is just header information, so we can discard this. ; That is, we can start with the second row, which is represented ; by index '1'. ; ; The latitude values fall in columns 6-12 (indices 7:13) ; The longitude values fall in columns 13-21 (indices 14:22) ; The pwv data values fall in columns 22-31 (indices 23:) ; lat = stringtofloat(charactertostring(cdata(1:,7:13))) lon = stringtofloat(charactertostring(cdata(1:,14:22))) pwv = stringtofloat(charactertostring(cdata(1:,23:)))