NCL Home > Documentation > Functions > File IO

addfiles

Creates a reference that spans multiple data files.

Prototype

	function addfiles (
		file_path [*] : string,  
		status        : string   
	)

	return_val [1] :  list

Arguments

file_path

A one-dimensional array of strings containing the full or relative path of the data files to be referenced.

status

Single string that specifies whether the files should be opened as read-only ("r") , read-write ("w") or create ("c").

Description

The addfiles function provides the user with the ability to access data spanning multiple files. The function returns a single variable of type list containing a list of references to the files pointed to by the file_path argument. Files pointed to by the file_path string must be in a supported file format and have a supported file extensions at the end of each file name. The extension is required even though it need not be part of the actual filename. The currently supported formats, valid status values, and accepted extensions are:

NetCDF ("r", "w", "c")
".nc", ".cdf", ".netcdf"
GRIB versions 1 and 2 ("r") (GRIB2 support available in versions 4.3.0 or later. )
".gr", ".gr1", ".grb", ".grib", ".grb1", ".grib1", ".gr2", "grb2", ".grib2"
HDF ("r", "w", "c")
".hdf", ".hd"
HDFEOS ("r")
".hdfeos", "he2", "he4"
CCM ("r")
".ccm"

addfile handles these extensions in a case-insensitive manner: ".grib", ".GRIB", and ".Grib" all indicate a GRIB file.

If the status "c" is set, the file is created if it doesn't exist. If it does exist, an error message is printed and the default missing value for files is returned. If "w" is set, and the files all exist and have permissions that allow for reading and writing, then the files are opened for reading and writing. If any of these conditions fail, an error message is reported and the default file missing value is returned. Similarly, if "r" is set, the files must exist and the user must have read permissions on those files. Otherwise, an error message is printed and the default missing value is returned. See the ismissing function on how to detect the returned missing value in a program.

The addfiles function differs from the addfile function in several ways:

  1. addfile creates only one file reference while addfiles provides for multiple file references.

  2. The variable returned by addfiles cannot be used as input to the "getfilexxxx" suite of functions. Instead, you can only input one element of this variable; i.e:
      files = systemfunc("ls *.nc")  
      f = addfiles(files,"r")
      dsizes = getfiledimsizes(f[0])
    

  3. When a variable is input via a reference generated by addfile, all values and, if present, all attributes and coordinate variables are input. A variable input via the reference generated by addfiles will have the values-only input. No attributes or coordinate variables will be input. It is the user's responsibility to attach these metadata, or the addfiles_GetVar function can be used to accomplish this.

  4. Data input via addfiles may be created via two different options, "join" and "cat" (the default), as specified by the ListSetType procedure.
The variable produced by the addfiles function must use "[" and "]" to access all or specific files. For example, "[:]" means all files while "[0]" means to access information on the first file only.

Under what conditions should the "cat" (default) and "join" options be used? Generally speaking, if the leftmost dimension of a variable is a "record" dimension (say, "time"), then the "cat" option is best. If, however, there is no record dimension (e.g. [lev,lat,lon]), then the "join" option is appropriate. One exception to the general rule of using "cat" is when there is a record dimension as outlined in example 3.

Note that if you use the "join" option and a command like systemfunc ("ls *.nc") to get a list of the netCDF files, then you need to make sure that the "ls" command gives you the files in the correct order that you want them joined.

See Also

addfiles_GetVar, addfile, ListSetType, ListGetType

Examples

Example 1

Read in a series of netCDF files (here, 5 files each with 12 time steps), and read into memory the four dimensional variable T(ntim,klvl,nlat,mlon), where ntim=12, klvl=5, nlat=48, mlon=96:

   diri = "/fs/cgd/data0/casguest/CLASS/"   ; input directory
   fils = systemfunc ("ls "+diri+"ann*.nc") ; file paths

   f    = addfiles (fils, "r")   

   ListSetType (f, "cat")        ; concatenate (=default)
   T    = f[:]->T                ; read T from all files
   printVarSummary (T)
The printVarSummary procedure yields:
   Variable: T
   Type: float
   Total Size: 5529600 bytes
               1382400 values
   Number of Dimensions: 4
   Dimensions and sizes:   [60] x [5] x [48] x [96]
   Coordinates: 
The size of the time dimension is now 60 (=5*12), while the other dimensions remain the same. Note also that no metadata has been copied. If this information is desired, the user must do it. For example:
   T!0  = "time"
   T!1  = "lev"
   T!2  = "lat"
   T!3  = "lon" 
   T&time = f[:]->time           ; time coord variable
   T&lev  = f[0]->lev            ; get lev from the 1st file
   T&lat  = f[0]->lat            ; get lat from the 1st file
   T&lon  = f[0]->lon            ; get lon from the 1st file
   T@long_name = "temperature"
   T@units     = "K"
   printVarSummary (T)
The printVarSummary yields:
   Type: float
   Total Size: 5529600 bytes
               1382400 values
   Number of Dimensions: 4
   Dimensions and sizes:   [time | 60] x [lev | 5] x [lat | 48] x [lon | 96]
   Coordinates: 
            time: [2349..4143]
            lev: [850000..250]
            lat: [-87.15909..87.15909]
            lon: [ 0..356.25]
   Number Of Attributes: 2
     units :       K
     long_name :   temperature
Example 2

The "XXX" files have no record dimension. All records are 5 (levels) x 48 (latitudes) x 96 (longitudes). Here we use the "join" option. This adds an extra dimension.

   diri = "/fs/cgd/data0/casguest/CLASS/"   ; input directory
   fils = systemfunc ("ls "+diri+"XXX*.nc") ; file paths

   f    = addfiles (fils, "r")   ; note the "s" of addfile

   ListSetType (f, "join")       
   T    = f[:]->T                ; read T from all files
   printVarSummary (T)
The printVarSummary procedure yields:
   Variable: T
   Type: float
   Total Size:  460800 bytes
                115200 values
   Number of Dimensions: 4
   Dimensions and sizes:   [5] x [5] x [48] x [96]
   Coordinates: 
The user can add metadata explicitly. For example:
   T!0  = "case"                 ; arbitrary name
   T!1  = "lev"
   T!2  = "lat"
   T!3  = "lon" 
   T&lev  = f[0]->lev            ; get lev from the 1st file
   T&lat  = f[0]->lat            ; get lat from the 1st file
   T&lon  = f[0]->lon            ; get lon from the 1st file
   T@long_name = "temperature"
   T@units     = "K"
   printVarSummary (T)
yields:
   Variable: T
   Type: float
   Total Size:  460800 bytes
                115200 values
   Number of Dimensions: 5
   Dimensions and sizes:   [case | 5] x [lev | 5] x [lat | 48] x [lon | 96]
   Coordinates: 
            lev: [850000..250.]
            lat: [-87.15909..87.15909]
            lon: [ 0..356.25]
   Number Of Attributes: 2
     units :       K
     long_name :   temperature
Example 3

Generally, when there is a record dimension one uses the "cat" option. In this example, let's assume the five different runs were made for a particular year. Each run was done using, say, different boundary layer parameterizations. Here the time variable is the same for each file and we want to compare the five different cases. The appropriate choice for this case is "join":

   diri = "/fs/cgd/data0/casguest/CLASS/"   ; input directory
   fils = systemfunc ("ls "+diri+"Bound*.nc") ; file paths

   f    = addfiles (fils, "r")   ; note the "s" of addfile

   ListSetType (f, "join")      
   T    = f[:]->T                ; read T from all files
   printVarSummary (T)
The printVarSummary procedure yields:
   Variable: T
   Type: float
   Total Size: 5529600 bytes
               1382400 values
   Number of Dimensions: 5
   Dimensions and sizes:   [5] x [12] x [5] x [48] x [96]
   Coordinates: 
The user can add metadata explicitly. For example:
   T!0  = "case"                 ; arbitrary name
   T!1  = "time"
   T!2  = "lev"
   T!3  = "lat"
   T!4  = "lon" 
   T&time = f[0]->time           ; time coord variable
   T&lev  = f[0]->lev            ; get lev from the 1st file
   T&lat  = f[0]->lat            ; get lat from the 1st file
   T&lon  = f[0]->lon            ; get lon from the 1st file
   T@long_name = "temperature"
   T@units     = "K"
   printVarSummary (T)
yields:
   Variable: T
   Type: float
   Total Size: 5529600 bytes
               1382400 values
   Number of Dimensions: 5
   Dimensions and sizes:   [case | 5] x [time | 12] x [lev | 5] x [lat | 48] x [
lon | 96]
   Coordinates: 
            time: [2349..2683]
            lev: [850..250.]
            lat: [-87.15909..87.15909]
            lon: [ 0..356.25]
   Number Of Attributes: 2
     units :       K
     long_name :   temperature
Example 4

As noted, addfiles does not result in metadata being attached to the variable read from the files. However, there is a function called addfiles_GetVar that will automatically attach metadata. It can result in much cleaner code, especially when many variables have to be read. The following concatenates the records:

load "$NCARG_ROOT/lib/ncarg/nclscripts/csm/contributed.ncl" 
begin                              
  diri = "/fs/cgd/data0/casguest/CLASS/"
  fils = systemfunc ("ls "+diri+"annual*")
    
  f    = addfiles (fils+".nc", "r")   ; note the "s" of addfile

  T = addfiles_GetVar (f, fils, "T")
  printVarSummary (T)
end 
The output from printVarSummary is:

Variable: T

Type: float
Total Size: 5529600 bytes
            1382400 values
Number of Dimensions: 4
Dimensions and sizes:   [time | 60] x [lev | 5] x [lat | 48] x [lon | 96]
Coordinates: 
            time: [2349..4143]
            lev: [850000..-72361.58]
            lat: [-87.15909..87.15909]
            lon: [ 0..356.25]
Number Of Attributes: 3
  missing_value :       1e+36
  units :       
  long_name :   temperature
Example 5

This example is similar to example 4, but here the ListType is set to "join" using the ListSetType function. The addfiles_GetVar function in "contributed.ncl" will name the extra dimension "case":

load "$NCARG_ROOT/lib/ncarg/nclscripts/csm/contributed.ncl" 
begin                              
  diri = "/fs/cgd/data0/casguest/CLASS/"
  fils = systemfunc ("ls "+diri+"ANNUAL*")
            
  f    = addfiles (fils+".nc", "r")   ; note the "s" of addfile

  ListType = "join"
  ListSetType (f, ListType ) 
            
  T = addfiles_GetVar (f, fils, "T")
  printVarSummary (T)
end 
The output yields:
Variable: T
Type: float
Total Size: 5529600 bytes
            1382400 values
Number of Dimensions: 5
Dimensions and sizes:   [case | 5] x [time | 12] x [lev | 5] x [lat | 48] x [lon
 
| 96]
Coordinates: 
            case: [0..4]
            time: [2349..2683]
            lev: [850000..-72361.58]
            lat: [-87.15909..87.15909]
            lon: [ 0..356.25]
Number Of Attributes: 3
  missing_value :       1e+36
  units :       
  long_name :   temperature