NCL Home>
Application examples>
File IO ||
Data files for some examples
Example pages containing:
tips |
resources |
functions/procedures
Reading ASCII data
This document shows how to read various types of ASCII files.
For examples of reading CSV files, see
the Reading CSV files page.
- asciiread - reads a file that contains ASCII
representations of basic data types.
- readAsciiTable - reads an
ASCII file given the number of lines at the beginning
and end of the file to ignore, and the number of columns to read. Can
be very slow if you have thousands of lines of data.
- readAsciiHead - reads an
ASCII file and returns just the header.
- numAsciiCol - returns the
number of columns in an ASCII file.
- numAsciiRow - returns the
number of rows in an ASCII file.
- str_fields_count - Count the
number of fields in a string, given a delimiter.
- str_get_cols - Retrieve a particular
column in a string, given a start and end index.
- str_get_field - Retrieve a particular
field in a string, given a delimiter.
- str_sub_str - Replace a substring with
another substring.
- Unix "cut"
command - allows you easily extract sections from a file.
asc1.txt - a file with
14 integers, one per line.
; Read data into a one-dimensional int array of length 14:
data = asciiread("asc1.txt",14,"integer")
npts = dimsizes(data) ; should be 14
print(data) ; Print the values
If you don't know how many data values you have, you can use the
special "-1" value for the dimension size. When you use -1, data
values will be read from left-to-right, top-to-bottom, into a 1D
array, until there are no values left.
; Read data into a one-dimensional array of unknown length:
data = asciiread("asc1.txt",-1,"integer")
npts = dimsizes(data) ; should be 14
string1.txt - a file
with no numerical data, just lines from a poem.
Use the special -1 value again, and a type of "string" to read in each
line. When you read strings, each line in the file will be considered
one string, regardless if it contains spaces, tabs, or any other kind
of white space.
; Read poem into a one-dimensional string array of unknown length:
filename = "string1.txt"
poem = asciiread(filename,-1,"string")
nlines = dimsizes(poem)
print("The poem in '" + filename + "' has " + nlines + " lines.")
print("This includes the title and the author.")
print(poem) ; Print the lines
asc2.txt - a file with a header line,
followed by 2 columns of integer and floating point data.
Even though this file contains multiple columns of data, when you use
the special "-1" value as a dimension size, the values will be read
into a one-dimensional array. The values will be read from from top to
bottom, left to right.
In this file, the header line will be ignored because it doesn't
contain any numerical data.
data = asciiread("asc2.txt",-1,"float")
print(data) ; Print the values
To read this data into a 2D array dimensioned 17 x 2 (17 rows by
2 columns), use:
data = asciiread("asc2.txt",(/17,2/),"float")
print(data) ; Print the values
stn_latlon.dat - a file with 980
rows and 10 columns of floating point data.
The first two methods show how to read this file if you know the exact
number of rows and columns, and the third method shows how to read
this file if you don't.
Method 1
; Read data into a 980 x 10 float array.
nrows = 980
ncols = 10
data = asciiread("stn_latlon.dat",(/nrows,ncols/),"float")
printVarSummary(data) ; Print information about file only.
; Two ways to print the data.
print(data) ; Print data, one value per line
write_matrix(data,ncols + "f7.2",0) ; Formatted output
Method 2
This file is actually a file of latitude and longitude values, each
dimensioned 70 x 70. The latitude values are written first on the
file, followed by the longitude values. Given this information, here's
another way to read in this file:
nlat = 70
nlon = 70
latlon2d = asciiread("stn_latlon.dat",(/2,nlat,nlon/),"float") ; 2 x 70 x 70
lat2d = latlon2d(0,:,:) ; 70 x 70
lon2d = latlon2d(1,:,:) ; 70 x 70
Method 3
Use the special contributed
functions numAsciiCol and
readAsciiTable function to first
calculate the number of columns, and then to read the data into an
array dimensioned nrows x ncols.
load "$NCARG_ROOT/lib/ncarg/nclscripts/csm/contributed.ncl"
filename = "stn_latlon.dat"
; Calculate the number of columns.
ncols = numAsciiCol(filename)
; Given the # of columns, we can use readAsciiTable to read this file.
data = readAsciiTable(filename,ncols,"float",0)
nrows = dimsizes(data(:,0)) ; calculate # of rows
print("'" + filename + "' has " + nrows + " rows and " + ncols + \
" columns of data.")
pw.dat - a file with a header line
and four columns of lined-up numeric and non-numeric data. The
"ID" column is non-numeric, but it does contain numbers as part of the
the ID names.
We need to parse out this first column so these numeric values don't
get mixed in with our real data.
Note that as of version
5.1.1, this kind of thing is much easier using
the str_get_field function, which we'll demonstrate
first.
New method, version 5.1.1 and later
; Read data into a big 1D string array
fname = "Data/asc/pw.dat"
data = asciiread(fname,-1,"string")
; Count the number of fields, just to show it can be done.
nfields = str_fields_count(data(1)," ")
print("number of fields = " + nfields)
;
; Skip first row of "data" because it's just a header line.
;
; Use a space (" ") as a delimiter in str_get_field. The first
; field is field=1 (unlike str_get_cols, in which the first column
; is column=0).
;
lat = stringtofloat(str_get_field(data(1::), 2," "))
lon = stringtofloat(str_get_field(data(1::), 3," "))
pwv = stringtofloat(str_get_field(data(1::), 4," "))
Old method, before version 5.1.1
The following example will only work if your columns are lined up nicely.
; Read data into a big 1D string array, and convert to a character array.
data = asciiread("./pw.dat", -1, "string")
cdata = stringtochar(data)
;
; The first row is just a header, so we can discard this.
; The data starts in the second row, which is represented
; by index 1.
;
; The latitude values fall in columns 6-12 (indices 7:13)
; The longitude values fall in columns 13-21 (indices 14:22)
; The pwv data values fall in columns 22-31 (indices 23:end)
;
; The "1:,"means start with the second row, and include all
; values to the end.
;
lat = stringtofloat(charactertostring(cdata(1:,7:13)))
lon = stringtofloat(charactertostring(cdata(1:,14:22)))
pwv = stringtofloat(charactertostring(cdata(1:,23:)))
This file can also be read by using a combination of the NCL
systemfunc function, and the Unix "cut"
command. Again, however, the data must be lined up nicely. With
"cut", the first character is considered to be column 1 (and not 0).
Another old method, before version 5.1.1
fname = "pw.dat"
clat = systemfunc("cut -c7-13 " + fname)
clon = systemfunc("cut -c14-22 " + fname)
cpw = systemfunc("cut -c23-31 " + fname)
; Ignore the first value, since this is just a header.
lat = stringtofloat(clat(1:))
lon = stringtofloat(clon(1:))
pwv = stringtofloat(cpw(1:))
asc3.txt - a file with several
columns of integer, float, and string data.
The first column contains date values like "200306130209", which we
want to parse into separate year, month, day, hour, and minute arrays.
We also want to read the third-from-the-last column, which are the
station names. We will again use the Unix "cut" command in order
to do this kind of parsing.
Note that as of version
5.1.1, this kind of thing is much easier using
the str_get_cols function, which we'll demonstrate
first.
New method, version 5.1.1 and later
fname = "asc3.txt"
data = asciiread(fname,-1,"string")
year = stringtofloat(str_get_cols(data, 1,4))
month = stringtofloat(str_get_cols(data,5,6))
day = stringtofloat(str_get_cols(data,7,8))
hour = stringtofloat(str_get_cols(data,9,10))
minute = stringtofloat(str_get_cols(data,11,12))
sta = str_get_cols(data,100,102)
Old method, before version 5.1.1
fname = "asc3.txt"
year = stringtofloat(systemfunc("cut -c1-4 " + fname))
month = stringtofloat(systemfunc("cut -c5-6 " + fname))
day = stringtofloat(systemfunc("cut -c7-8 " + fname))
hour = stringtofloat(systemfunc("cut -c9-10 " + fname))
minute = stringtofloat(systemfunc("cut -c11-12 " + fname))
sta = systemfunc("cut -c100-102 " + fname)
Note: you cannot use
stringtointeger to convert
numbers like "09" to "9", because the preceding "0" causes NCL to
treat the number as an octal value and "9" is not a valid octal
value.
istasyontablosu_son.txt -
a mix of numeric and non-numeric data in columns that are not lined up
nicely.
This file is pretty easy to read, because the non-numeric columns
don't have a mix of alpha and numeric characters. Here's a script to
read the first, fifth, and sixth columns (latitude, longitude, and
station numbers) into separate variables:
stationfile="istasyontablosu_son.txt"
; Read all data into a one-dimensional variable.
dummy = asciiread(stationfile,-1,"float")
ncol = 6 ; # of columns
npts = dimsizes(dummy)/ncol ; # of points
stationdata = onedtond(dummy,(/npts,ncol/)) ; npts x ncol
stn = stationdata(:,0) ; station numbers
lat = stationdata(:,4) ; latitude values
lon = stationdata(:,5) ; longitude values
; Print the mins/maxs just to verify the data looks correct.
print("min/max stn = " + min(stn) + "/" + max(stn))
print("min/max lat = " + min(lat) + "/" + max(lat))
print("min/max lon = " + min(lon) + "/" + max(lon))
As of version 5.1.1, you can
read fields from this file using str_get_field.
; Read all data into a one-dimensional variable.
stationfile = "istasyontablosu_son.txt"
data = asciiread(stationfile,-1,"string")
; Count the number of fields, just to show it can be done.
nfields = str_fields_count(data(0)," ")
print("number of fields = " + nfields)
stn = stringtofloat(str_get_field(data,1," "))) ; station numbers
lat = stringtofloat(str_get_field(data,6," ")) ; latitude values
lon = stringtofloat(str_get_field(data,7," ")) ; longitude values
; Print the mins/maxs just to verify the data looks correct.
print("min/max stn = " + min(stn) + "/" + max(stn))
print("min/max lat = " + min(lat) + "/" + max(lat))
print("min/max lon = " + min(lon) + "/" + max(lon))
L3_aiavg_n7t_197901.txt
- a file with a mix of text, integers, and floats and no
delimiters
This file
(ftp://toms.gsfc.nasa.gov/pub/nimbus7/)
came from the Nimbus-7/TOMS instrument launched on October 1978. For
more information, see
the 1README.txt
file in the same directory.
This is a complicated file to read given the lack of structure and
delimiters. It took a combination of "do" loops
and str_get_cols to parse the data.
The L3_read.ncl reads in all the
values, and optionally writes them to a NetCDF and/or generates a
contour plot.
asc4.txt - a file with some header and
footer lines, and a mix of numeric data. The headers contain some
numbers, and some of the numeric data contain commas. The columns
are separated by tabs.
See what happens when you read this data using
asciiread and the special -1 value:
data = asciiread("asc4.txt",-1,"float")
print(data)
Notice that the number "15" in the header becomes the first data
value read in. The number "2008" from "October 2008" becomes the
second value, and so on. Also notice what happens to values
with commas, like "1,321". This becomes two separate numbers,
"1" and "321".
In version 5.1.1,
we added a suite of string functions that make reading this
file much easier. You can use str_sub_str
to replace the commas with an empty string,
and str_get_field to read the desired fields,
using a as the delimiter.
load "$NCARG_ROOT/lib/ncarg/nclscripts/csm/contributed.ncl"
begin
;
; Read population data into an array of strings, removing the
; first 4 lines and the last 2 lines (header and footer).
;
data = readAsciiTable("asc4.txt",1,"string",(/4,2/))
; Replace commas with an empty string.
data = str_sub_str(data,",","")
country = str_get_field(data,1," ")
population = stringtointeger(str_get_field(data,2," "))
percentage = stringtofloat(str_get_field(data,3," "))
print(country + ": population: " + population + " (" + percentage + "%)")
end
Before V5.1.1, it is not trivial to read this file. You have to first
remove the commas, write a new data file, and then you can read this
data easily with asciiread. Download the read_asc4.ncl script for an example
of how to accomplish this.
asc5.txt - a data file where the first
row contains the name of each field separated by a delimiter, and the
rest of the file contains the values of each field separated by the
same delimiter.
You can download the NCL script below to read in the "asc5.txt" and
write it out to a netCDF file, using the field names as variable names
on the netCDF file.
The script is rather lengthy; this is because it requires string
parsing which is not one of NCL's strong suits. Also, there's a bit of
checking involved to allow multiple types to be read in.
In version 5.1.1 and later,
this kind of data file is easier to read in. Scripts for both
will be provided.
In order to write fields to a netCDF file, the netCDF field
(variable) names cannot contain any tabs or spaces. Hence this script
removes white spaces from the beginning and end of any field names and
converts other white space to underscores ('_'). String or character
values for the fields themselves are not modified.
Note: it is not generally recommended to read in complex ASCII files
with NCL, but this example shows that it can be done.
If you want to use this script for your own purposes, you will need to
modify the script to indicate 1) the input ASCII file name, 2) the
number of fields, 3) the delimiter, 4) the type of each field,
and 5) whether the field contains missing values.
To modify either one for your own data file, first search for the
lines:
;============================================================
; Main code
;============================================================
The lines you need to modify follow shortly:
filename = "asc5.txt" ; ASCII file to read.
nfields = 6 ; # of fields
delimiter = "," ; field delimiter
var_types = new(nfields,string)
var_msg = new(nfields,string)
var_strlens = new(nfields,integer) ; var to hold string lengths,
; just in case.
.
.
.
var_msg = "" ; Default to no missing
var_msg(3) = "-999" ; Corresponds to field #4
var_types = "integer" ; Default to integer
var_types(1:2) = "float" ; Second and third fields
var_types(4) = "character" ; Corresponds to field #5
Change "var_types" to whatever the types of your fields are, and
"var_msg" to what the missing value should be (an empty string
indicates no missing value).
The above code is defaulting all variable types to "integer", and then
changing the 2nd and 3rd fields to type "float" and the fifth field to
type "character" (which in this case is being used as a character
array). The only field that will contain a missing value
is the fourth field.
The allowable variable types are "integer", "float", "double",
"string", or "character". Note that if you read in a variable as a
string, it won't get written to the netCDF file because only character
arrays can be written to a netCDF file.
asc6.txt - a data file with
a header, and three columns of floating point data (lat, lon, temp).
The temperature data on this file is dimensioned nlat x nlon (89 x
240), and has a lat,lon value for each data value. The lat and lon
data on this file are repetitious. That is, for each of the nlat (89)
latitude values, you have the same nlon (240) longitude values. Hence
you have 2130 rows of data, but lat and lon values are repeated.
Download the read_asc6.ncl script
for an example of how to read this file, discard the repetitious data,
and create a variable "temp2D" with one-dimensional latitude and
longitude coordinate arrays.
Here's a quick look at the part of the code that reads in the data:
nlat = 89
nlon = 240
data = asciiread("asc6.txt",(/nlat*nlon,3/),"float")
lat1d = data(::nlon,0)
lon1d = data(0:nlon-1,1)
temp1D = data(:,2) ; 1st create a 1d array
temp2D = onedtond(temp1D,(/nlat,nlon/)) ; convert 1D array to a 2D array
How to read multiple ASCII files into one variable in NCL.
This example assumes the files contain the same number of columns,
but not necessarily the same number of rows.
dasc = "./" ; input directory for ascii files
fasc = "2009*asc" ; a unique identifier for files
;;fasc = "*asc"
DASC = "./" ; output dir
FASC = "BIG.asc" ; output file name
system ("/bin/rm -f "+DASC+FASC) ; rm any pre-existing file
; Use UNIX "cat" to concatenate the files into one file.
system ("cd "+ dasc+" ; cat "+fasc+" > "+DASC+FASC)
; You can now read the file via "asciiread".
nrows = numAsciiRow(DASC+FASC) ; contributed.ncl
ncols = numAsciiCol(DASC+FASC)
data = asciiread(DASC+FASC,(/nrows,ncols/),"float")
print(data)
;;system ("/bin/rm "+DASC+FASC) ; rm the created file
How to read a very large (thousands of lines) ASCII file of numeric
data that contains header and/or footer lines.
Ideally, you would
use readAsciiTable to read the
data, stripping off the undesired headder and/or footer
lines. However, this function can be very slow, as it has to read the
data in as an array of strings (possibly multiple times) in order to
parse it correctly.
The fastest way to read in numeric data is to
use asciiread. Since this function reads in every
single value in a file, this means that any numbers that are in your
header or footer lines will get read in as real values.