NCL Home> Application examples> Data sets || Data files for some examples

Example pages containing: tips | resources | functions/procedures

NCL: HDF: Hierarchical Data Format
HDF-SDS: Scientific Data Set
HDF-EOS: Earth Observing System

HDF and HDF-EOS

In 1993, NASA chose the Hierarchical Data Format Version 4 (HDF4) to be the official format for all data products derived by the Earth Observing System (EOS). It is commonly used for satellite based data sets. There are several 'models' of HDF4 which can be a bit confusing. Each model addresses different needs. One model is the Scientific Data Set (SDS) which is similar to netCDF-3 (Classic netCDF). It supports multi-dimensional gridded data with meta-data including support for an unlimited dimension.

To better serve a broader spectrum within the user community with needs for geolocated data, a new format or convention, HDF-EOS2 was developed. HDF-EOS supports three geospatial data types: grid, point, and swath.

Limitations of the HDF4 format, needs for improved data compression and changing computer paradigms led to the introduction of a new HDF format (HDF5) in 2008. The HDF4 and HDF5 appellations might imply some level of compatibility. Unfortunately, no! These are completely independent formats. The calling interfaces and underlying storage formats are different.

Unlike the netCDF community which has a well established history of conventions (eg., Climate and Forecast convention), there seems to be a lack of commonly accepted conventions by the HDF satellite community. For example the "units" for geographical coordinates under the CF convention must be recognized by the udunits package. Often on HDF files the latitude and longitude variables will have the units "degrees" while netCDF's CF convention would require "degrees_north" and "degrees_east" or some other recognized units.

Some experiments (eg, AURA) have developed "guidelines" for data. However, perhaps because the spectrum of experiments is large, the HDF community does not have a "culture" where broadly accepted conventions are commonly used.

NASA Data Processing Levels

NASA data products are processed at various Data Processing Levels ranging from Level 0 to Level 4. Level 0 products are raw data at full instrument resolution. At higher levels, the data are converted into more useful parameters and formats. Often the levels are included in the file name: eg. L0, L1, L1A, L1B, ..., L3, L4.

NCL General Comments

NCL recognizes and supports multiple data formats including HDF4, HDF5, HDF-EOS2 and HDF-EOS5. The following HDF related file extensions are recognized: "hdf", "hdfeos", "he2", "he4", and "he5".

The first rule of 'data processing' is to look at the data. The command line utility ncl_filedump can be used to examine a file's contents. Information such as a variable's type, size, shape can allow users to develop optimal code for processing.

The stat_dispersion and pdfx functions can be used to examine a variable's distribution. It is not uncommon for outliers to be present. If so, it is best to manually specify the contour limits and spacing to maximize information content on the plots.

A possible source of confusion is that variables that are "short" or "byte" can be unpacked to float via two different formulations. If x represents a variable of type "short" or "byte", the unpacked or scaled value can be derived via:

          value =  x*scale_factor + add_offset
or
          value = (x - add_offset)*scale_factor

Examples of files that use the latter formula are at MYDATML_2, MYD04_L2.

The NCL functions short2flt/ byte2flt can be used to unpack the former, while, short2flt_hdf/ byte2flt_hdf can be used to unpack the latter.

FYI: An Inconsistency between Reading HDF4 and HDF-EOS2

The following does not apply for NCL versions 6.2.0 and newer. NCL v6.2.0 perform a 'double read' HDF and HDF-EOS and merges the appropriate meta data.

An issue which may confuse users reading HDF-EOS2 files is that a variable imported after a file has been opened with a .hdf extension may have different meta data associated with it then if it had been imported after being opened with a .hdfeos or .he2 extension. The reason is that although HDF-EOS2 library has an interface for getting variable attributes, many of the attributes that would be visible when reading using the straight HDF library are not accessible using the HDFEOS library. Conversely, the coordinates are often only visible using the HDF-EOS2 library. So, currently, the only solution is for the user to examine the variable by opening the file via each extension (ie: a 'double read').

Data used in the Examples

Regardless of the dataset used, the same principle can be used to process the data.

HDF-SDS [hdf]
   TRMM   - Tropical Rainfall Measuring Mission
   MODIS  - Moderate Resolution Imaging Spectroradiometer
   SeaWiFS- Sea-viewing Wide Field-of-view Sensor [ SeaWiFS examples ]

HDF4EOS [he2, he4, hdfeos]
   AIRS   - Atmospheric Infrared Sounder
   MODIS  - Moderate Resolution Imaging Spectroradiometer

HDF5EOS [he5]
   HIRDLS - High Resolution Dynamics Limb Sounder
   MLS    - Microwave Limb Sounder
   OMI    - Ozone Monitoring Instrument
   TES    - Tropospheric Emission Spectrometer

HDF5 [h5]
   GPM    - Global Precipitation Mission
   SMAP   - Soil Moisture Active Passive 

HDF Group Comprehensive Examples

The HDF-Group has created a suite of examples called the hdfeos.org/zoo. It contains NCL, Matlab, IDL and Python example scripts and the associated images. A 'side-benefit' is that looking at these examples provides some insight on how the different tools accomplish the same task. Caveat: Possibly, an expert in any one of these languages could create a more elegent script and better image.

They also provide NCL specific comments.

Some HDF examples may need a library named HDFEOS_LIB.ncl .

hdf4sds_1.ncl: Read a TRMM file containing 3-hourly precipitation at 0.25 degree resolution. The geographical extent is 40S to 40N. This data is from the Tropical Rainfall Measuring Mission (TRMM).

Create a packed netCDF using the pack_values function. This creates a file half the size of those created using float values. Some precision is lost but is not important here.

This HDF file is classifed as a "Scientific Data Set" (HDF4-SDS). Unfortunately, the file is not 'self-contained'. because the file contnts do not contain the geographical coordinates or temporal information The former must be obtained via a web site while the time is embedded within the file name.

hdf4sds_2.ncl: Read an HDF4-SDS file that contains high resolution (1km) data over India and Sri Lanka. The file does not explicitly contain any coordinate arrays. However, the variable on the file "Mapped_Composited_mapped" does have the following attributes:
         Slope :         1
         Intercept :     0
         Scaling :      linear
         Limit :        (    4,   62,   27,   95 )
         Projection_ID :        8
         Latitude_Center :         0
         Longitude_Center :     78.5
         Rotation :        0
The Slope/Intercept attributes would indicate that no scaling has been applied to the data. The Limit attribute indicates the geographical limits and the Latitude_Center/Longitude_Center/Rotation specify map attributes. The variable does not specify any missing value [_FillValue] attribute but, after looking at the data, it was noted that the value -1 is appropriate.

The stat_dispersion function was used to determine the standard and robust estimates of the variable's dispersion. Outliers are present and the contour information was manually specified.

hdf4sds_3.ncl: Read multiple files (here, 131 files) for one particular day; for each file bin and sum the satellite data using bin_sum; after all files have been read, use bin_avg to average all the summed values; plot; create a netCDF of the binned (gridded data).

Note: Here the data are netCDF files. However, the original files were HDF-SDS (eg: MYD06_L2.A2005364.1405.005.2006127140531.hdf). The originating scientist converted these to netCDF for some reason. NCL can handle either. Only the file extension need be changed (.nc to .hdf).

hdf4sds_4.ncl: Read a HDF-SDS dataset containing MODIS Aqua Level-3 SSTs. The file attributes contain the geographical information and this is used to generate coordinate variables. One issue is that the data "l3m_data" are of type "unsigned short". These are not explicitly supported through v5.1.1 (but will be in 5.2.0). Hence, a simple 'work-around' is used.

In addition to plotting the original 9KM data, the area_hi2lores is used to interpolate the data to a 0.5x0.5 grid. A netCDF file is created.

Other 'L3' datasets could be directly used in the sample script. For example: Example 3 on the SeaWiFS Application page.

hdf4sds_5.ncl: Read four MODIS HDF datasets and create a series of swath contours over an Orthographic map. The 2D lat/lon data is read off of each file and used to determine where on the map to overlay the contours.

This example uses gsn_csm_contour_map to create the map plot with the first set of contours, and then creates the remaining contour plots with gsn_csm_contour. The overlay procedure is then used to overlay these remaining contour plots over the existing contour/map plot.

hdf4sds_6.ncl: TRMM 2A12: TMI Hydrometeor (cloud liquid water, prec. water, cloud ice, prec. ice) contains profiles in 14 layers at 5 km horizontal resolution, along with latent heat and surface rain, over a 760 km swath. Specify a variable (here, "latentHeat") and plot (a) the entire swath; (b) region near India; (c) a vertical profile at locations where the latent heat exceeds 1500. The file contains no units information for the variables.

hdf4sds_7.ncl: A SEVIRI Level-3 water vapor data set. The variable is packed in a rather unusual fashion. The flags should be viewed to determine the source of the data.

hdf4sds_8.ncl: A TRMM A25 file is read. At each pixel, the maximum 'correctZFactor' from all levels is extracted using dim_max_n. Several ways of presenting the data for a swath are illustrated.

hdf4sds_9.ncl: A TRMM A25 file is read. The 'nearSurfZ' is imported. Several ways of presenting the data for a swath are illustrated.

hdf4eos_1.ncl: Read a HDF-EOS2 file containing swath data. NCL identifies the swath as MODIS_SWATH_Type_L1B. Create a simple plot of reflectance with coordinates of scanline and pixel.

The eos.hdf that appears as the file name is an alias for MOOD021KM.A2000303.1920.002.2000317044659.hdf. It is not uncommon for a HDF-EOS2 file to have the ".hdf" file extension. In this case, NCL will open and read the file sucessfully but it is best to manually append the ".hdfeos" extension when opening the file in the addfile function.

     ncl_filedump eos.hdf
yields
     filename:       eos
path:   eos.hdf
   file global attributes:
      HDFEOSVersion : HDFEOS_V2.6
      StructMetadata_0 : GROUP=SwathStructure
        GROUP=SWATH_1
                SwathName="MODIS_SWATH_Type_L1B"
                [...SNIP...]
        END_GROUP=SWATH_1
END_GROUP=SwathStructure
[...SNIP...]
hdf4eos_2.ncl: Example of a radiance plot. Note that the color table is reversed from example 1.
hdf4eos_3.ncl: A multiple contour plot of other quantities on the MODIS file.
hdf4eos_4.ncl: MODIS data placed on a geographical projection. A rather awkward aspect of this file is that the Latitude and Longitude variables differ in size from the variable being plotted.

The 5 added to the map limits is arbitrary (not required). Here it is used to specify extra space around the plot.
hdf4eos_5.ncl: Illustrates the use of dim_gbits to extract specic bits within a bit-stream. A data (variable) issue is that the cloud mask variable is on a 1-km grid while the latitude/longitude variables are on a 5-km grid. NCL array syntax (::5) is used to decimate (sub-sample) the 1-km array to the 5-km array) It also demonstrates explicitly labeling different label bar colors with a specific integers.

The use of res@trGridType = "TriangularMesh" makes the plotting faster.

NCL 6.5.0 introduces a new function, get_bitfield, that will simplify the use of dim_gbits. hdf4eos_5a.ncl illustrates the use of this new function to obtain the same result.

hdf4eos_6.ncl: Read an AIRS Level-3 file (here, product type AIRX3STD). This uses the AIR IR and MSU instruments. Although on a 1x1 degree grid, the grid point values represent satellite swaths that have been binned over a period of time (24 hours). The data to the left and right of the Date Line represent values that were sampled at different times. Hence, the gridded values are not cyclic in longitude.
Support for HDF5 and HDF-EOS5 is present in v5.2.0 which was released April 14, 2010. Some samples follow.
hdf5eos_1.ncl: Read a HDF-EOS5 (available v5.2.0) file from the Aura OMI (Ozone Monitoring Instrument) and plot all the variables on the file. Here only two of the variables are shown.
hdf5eos_2.ncl: Read an HDF-EOS5 (available v5.2.0) file (OMI) and plot selected variables on the file. (The ncl_filedump utility was used to preview the file's contents,) This example also demonstrates how to retrieve a variable's type prior to reading it into memory. (See getfilevartypes.) It is best to use short2flt or byte2flt if the variable type is "short" or "byte". These functions will automatically apply the proper scaling.

Note that the units for the "EffectiveTemperature" variable appear to be incorrect. They indicate "degrees Celsius" but the range would indicate "degrees Kelvin". This could be addressed by adding the following to the script after the variable has been imported:

     if (vNam(nv).eq."EffectiveTemperature_ColumnAmountO3") then
         x@units = "degrees Kelvin"       ; fix bad units
     end if
hdf5eos_3.ncl: Read an HDF-EOS5 (available v5.2.0) file from the MLS (Aura Microwave Limb Sounder) and plot a two-dimensional cross-section (pressure/time) of temperature. Then plot the trajectory of the satellite over this time.

Sean Davis (NOAA) has put together a gridded satellite WV/ozone product called SWOOSH that includes MLS data.

hdf5eos_3a.ncl: Very similar to the previous example: (a) Different data set (MLS-Aura_L2GP-O3); (b) trajectory values are colored by value. The 3 left figures show the vertical profile of O3 along the trajectory for (i) all levels; (b) a smaller subset; and (c) a very small subset. A user requested an example of gridding the sparsely sampled trajectory data. Obviously it was not a 'nice' picture.

Sean Davis (NOAA) has put together a gridded satellite WV/ozone product called SWOOSH that includes MLS data.

hdf5eos_4.ncl: Read an HDF-EOS5 (available v5.2.0) file from the HIRDLS: (a) Use stat_dispersion to print the statistical information for each variable; (b) compute PDFs via pdfx; (c) plot cross-sections; and, (d) plot time series of three different variables.
hdf5eos_5.ncl: Read two similar OMI files (L3-OMDOAO3e and L3-OMTO3e). This illustrates that users must look at a file's contents before using. (Use ncl_filedump) Here, there are two files with similar variables but NCL assigns slightly different names.

The OMI L3 files used in this example have a bug in the way the OMI data were written to the files. NCL version 6.1.0 and later will be able to identify OMI data files and will automatically correct for the latitude reversal.

amsr_1.ncl: Swaths containing soil moisture from the Advanced Microwave Scanning Radiometer (AMSR) are read from multiple h5 files. Documentation was not available but there were file 'issues': (1) The variable name has a space ("Geophysical Data"); (2) The variable has an attribute FillValue and not _FillValue; (3) There are two 'missing value categories' (-32767s, -32768s) (4) The unrecognized unpacking attribute ["SCALE FACTOR"] to must be converted to "scale_factor" for use by NCL's short2flt function.
MOPITT_MOP03M_1.ncl: Read a variable and plot a user specified level. The grid is (180x360). Missing values are present. A common issue with HDF files is that they do not contain all the desired information or units. Hence, they must be manually provided.
MOPITT_MOP03M_2.ncl: Read a variable and plot all levels.
MOPITT_MOP03M_3.ncl: Read a variable and interpolate to a much coarser 46x72 grid using linint2_Wrap and area_hi2lores_Wrap. Missing areas are present. Then use poisson_grid_fill to fill in the missing values and repeat the interpolations.
MOPITT_MOP02T_1.ncl: Reads "CO Total Column" across a time interval and a selected spatial region, and plots it on a map using a range of colored markers.

This script was contributed by Rebecca Buchholz, a researcher in the Atmospheric Chemistry Division at NCAR.

hdf5_1.ncl: Read a variable and plot it according to a palette specified on the file. The MSG (Meteosat Second Generation) file is HDF5. The desired variables contain a dash and a space which are not allowed in NCL variable names. Hence, the variables are enclosed in quotes to make them type string. For the -> file syntax operator to successfully access the string variables, they must be enclosed within dollar sign ($).
hdf5_2.ncl: HDF5 (h5) files can be complicated. Users should become acquainted with the file's contents prior to creating a script. An ncl_filedump of the file would yield:

   %> ncl_filedump K1VHR_15NOV2013_1200_L02_OLR.h5 | less

      Variable: f
      Type: file
      filename:       K1VHR_15NOV2013_1200_L02_OLR
      path:   K1VHR_15NOV2013_1200_L02_OLR.h5
         file global attributes:
         dimensions:
            DIM_000 = 250601

         variables:
            group </OLR>
      
            compound <OLR_Dataset>    (Latitude, Longitude, OLR) (DIM_000)
      
            group </OLR/GP_PARAM_INFO>
      
               GP_PARAM_DESCRIPTION : Every_Acquisition
               GP_PARAM_NAME :        Outgoing Longwave Radiation (OLR)
               Input_Channels :       TIR             
               LatInterval :  0.25
               Latitude_Unit :        Degrees         
               LonInterval :  0.25
               Longitude_Unit :       Degrees         
               MissingValueInProduct :         999
               OLR_Unit :     Watts/sq. met.  
               ValidBottomLat :        -60
               ValidLeftLon :   10
               ValidRightLon :         140
               ValidTopLat :    60
      
            group </PRODUCT_INFORMATION>
      
               GROUND_STATION :       BES,SAC/ISRO,Ahmedabad,INDIA. 
               HDF_PRODUCT_FILE_NAME :        K1VHR_15NOV2013_1200_L02_OLR.h5
               OUTPUT_FORMAT :        hdf5-1.6.6
               PRODUCT_CREATION_TIME :        2013-11-15T18:01:49 _L02_OLR.h5
               STATION_ID :   BES3-11-15T18:01:49 _L02_OLR.h5
               UNIQUE_ID :    K1VHR_15NOV2013_1200_L02_OLR.h5
      
            group </PRODUCT_METADATA>
      
            group </PRODUCT_METADATA/PRODUCT_DETAILS>
      
               ACQUISITION_DATE :     15NOV2013
               ACQUISITION_TIME_IN_GMT :      1200V2013
               PROCESSING_LEVEL :     L020V2013
               PROCESSING_SOFTWARE :  InPGS_XXXXXXXXXXXXXX
               PRODUCT_NAME : GP_OLR         XXXXX
               PRODUCT_TYPE : GEOPHY
               SENSOR_ID :    VHRPHY
               SPACECRAFT_ID :        KALPANA-1

Read a h5 (HDF5) file with 'group' fields. Note the syntax used by NCL. The Latitude, Longitude and OLR are one-dimensional arrays of size 250601. There are no general file conventions for HDF5 files. Users must examine the file's contents and explicitly extract desired information. The region of the globe is defined as attributes of the group /OLR/GP_PARAM_INFO. These are used to create rectilinear grid coordinates which are associated with the variable to be plotted (olr) and written to a netCDF file.

FYI: It is not necessay to make a two-dimensional grid. NCL can plot one-dimensional latitude/longitudinal/values with appropriate graphical resource settings. However, since a script option is to create a netCDF file. It was designed to create the variable as a conventional two-dimensional array.

seawif_4.ncl: Read a SeaWIFS variable and use meta data contained within the file attributes to construct a complete (ie, self contained) variable. The initial look at the file was via the ncl_filedump command line operator.

Data files were obtained here.

Additional SeaWIFS examples are available here.

smap_l3_1.ncl:
NOTE: It is recommended that users use ncl_filedump to carefully examine the SMAP level-3 (l3) file prior to use. Note the 'complexity' of the file. Assorted variables are within 'groups.' NCL offers two ways to access data within groups. This and other examples use the 'full path' method. (see script)

Read soil moisture from an individual H5 SMAP level-3 (l3) file. The data are on a regular grid BUT the data is from a swath. The desired variable is named 'soil_moisture'. It is located within the group named Soil_Moisture_Retrieval_Data. The latitude and longitude variables are within this group also. A peculiarity is that the 'latitude' and 'longitude' variables do not have an _FillValue attribute associated with them. However, a printMinMax indicates a minimum value of -9999.0 for each. The script assigns these manually.

smap_l3_2.ncl: Similar to smap_l3_1. The difference is that this read mutiple files. The files are processed one-by-one. Only the first three of of nine SMAP files are shown.
smap_l3_3.ncl: This loops over the nine files for a particular period and creates a composite plot by overlaying individual graphical swath grids. Setting, res@cnMissingValFillColor= "Transparent", keeps plotting from previous grids from being overwritten.
smap_l3_4.ncl: Similar result to smap_l3_3.ncl. However, it is obtained via a different method. A 'super variable' is create and the grid points are filled by use of a where function. After reading the EASE latitude and longitude variables from a separate file (SMAP_EASE.406x964.nc), the resulting super variable is plotted.