NCL Home> Application examples> Data sets || Data files for some examples

Example pages containing: tidbits | resources | functions/procedures

HDF: Hierarchical Data Format
HDF-SDS: Scientific Data Set
HDF-EOS: Earth Observing System

HDF and HDF-EOS

In 1993, NASA chose the Hierarchical Data Format Version 4 (HDF4) to be the official format for all data products derived by the Earth Observing System (EOS). It is commonly used for satellite based data sets. There are several 'models' of HDF4 which can be a bit confusing. Each model addresses different needs. One model is the Scientific Data Set (SDS) which is similar to netCDF-3 (Classic netCDF). It supports multi-dimensional gridded data with meta-data including support for an unlimited dimension.

To better serve a broader spectrum within the user community with needs for geolocated data, a new format or convention, HDF4-EOS was developed. HDF-EOS supports three geospatial data types: grid, point, and swath.

Limitations of the HDF4 format, needs for improved data compression and changing computer paradigms led to the introduction of a new HDF format (HDF5) in 2008. The HDF4 and HDF5 appellations might imply some level of compatibility. Unfortunately, no! These are completely independent formats. The calling interfaces and underlying storage formats are different.

Unlike the netCDF community which has a well established history of conventions (eg., Climate and Forecast convention), there seems to be a lack of commonly accepted conventions by the HDF satellite community. For example the "units" for geographical coordinates under the CF convention must be recognized by the udunits package. Often on HDF files the latitude and longitude variables will have the units "degrees" while netCDF's CF convention would require "degrees_north" and "degrees_east" or some other recognized units.

Some experiments (eg Aura) have developed "guidelines" for data. However, perhaps because the spectrum of experiments is large, the HDF community does not have a "culture" where broadly accepted conventions are commonly used.

NCL Comments

NCL recognizes and supports multiple data formats including HDF4-SDS, HDF4-EOS and HDF5-EOS (v5.2.0). The following HDF related file extensions are recognized: "hdf", "hdfeos", "he2", "he4", and "he5".

The first rule of 'data processing' is to look at the data. The command line utility ncl_filedump can be used to examine a file's contents. Information such as a variable's type, size, shape can allow users to develop optimal code for processing.

The stat_dispersion and pdfx functions can be used to examine a variable's distribution. It is not uncommon for outliers to be present. If so, it is best to manually specify the contour limits and spacing to maximize information content on the plots.

A possible source of confusion is that variables that are "short" or "byte" can be unpacked to float via two different formulations. If x represents a variable of type "short" or "byte", the unpacked or scaled value can be derived via:

          value =  x*scale_factor + add_offset
or
          value = (x + add_offset)*scale_factor
The NCL functions short2flt/ byte2flt can be used to unpack the former, while, short2flt_hdf/ byte2flt_hdf can be used to unpack the latter.

Data used in the Examples

Regardles of the dataset used, the same principle can be used to process the data.
HDF-SDS [hdf]
   TRMM   - Tropical Rainfall Measuring Mission
   MODIS  - Moderate Resolution Imaging Spectroradiometer
   SeaWiFS- Sea-viewing Wide Field-of-view Sensor [ SeaWiFS examples ]
HDF4EOS [he2, he4, hdfeos] 
   MODIS  - Moderate Resolution Imaging Spectroradiometer
HDF5EOS [he5] 
   HIRDLS - High Resolution Dynamics Limb Sounder
   MLS    - Microwave Limb Sounder
   OMI    - Ozone Monitoring Instrument
   TES    - Tropospheric Emission Spectrometer
hdf4sds_1.ncl: Read a TRMM file containing 3-hourly precipitation at 0.25 degree resolution. The geographical extent is 40S to 40N. This data is from the Tropical Rainfall Measuring Mission (TRMM).

Create a packed netCDF using the pack_values function. This creates a file half the size of those created using float values. Some precision is lost but is not important here.

This HDF file is classifed as a "Scientific Data Set" (HDF4-SDS). Unfortunately, the file is not 'self-contained'. because the file contnts do not contain the geographical coordinates or temporal information The former must be obtained via a web site while the time is embedded within the file name.

hdf4sds_2.ncl: Read an HDF4-SDS file that contains high resolution (1km) data over India and Sri Lanka. The file does not explicitly contain any coordinate arrays. However, the variable on the file "Mapped_Composited_mapped" does have the following attributes:
         Slope :         1
         Intercept :     0
         Scaling :      linear
         Limit :        (    4,   62,   27,   95 )
         Projection_ID :        8
         Latitude_Center :         0
         Longitude_Center :     78.5
         Rotation :        0
The Slope/Intercept attributes would indicate that no scaling has been applied to the data. The Limit attribute indicates the geographical limits and the Latitude_Center/Longitude_Center/Rotation specify map attributes. The variable does not specify any missing value [_FillValue] attribute but, after looking at the data, it was noted that the value -1 is appropriate.

The stat_dispersion function was used to determine the standard and robust estimates of the variable's dispersion. Outliers are present and the contour information was manually specified.

hdf4sds_3.ncl: Read multiple files (here, 131 files) for one particular day; for each file bin and sum the satellite data using bin_sum; after all files have been read, use bin_avg to average all the summed values; plot; create a netCDF of the binned (gridded data).

Note: Here the data are netCDF files. However, the original files were HDF-SDS (eg: MYD06_L2.A2005364.1405.005.2006127140531.hdf). The originating scientist converted these to netCDF for some reason. NCL can handle either. Only the file extension need be changed (.nc to .hdf).

hdf4sds_4.ncl: Read a HDF-SDS dataset containing MODIS Aqua Level-3 SSTs. The file attributes contain the geographical information and this is used to generate coordinate variables. One issue is that the data "l3m_data" are of type "unsigned short". These are not explicitly supported through v5.1.1 (but will be in 5.2.0). Hence, a simple 'work-around' is used.

In addition to plotting the original 9KM data, the area_hi2lores is used to interpolate the data to a 0.5x0.5 grid. A netCDF file is created.

Other 'L3' datasets could be directly used in the sample script. For example: Example 3 on the SeaWiFS Application page.

hdf4sds_5.ncl: Read four MODIS HDF datasets and create a series of swath contours over an Orthographic map. The 2D lat/lon data is read off of each file and used to determine where on the map to overlay the contours.

This example uses gsn_csm_contour_map to create the map plot with the first set of contours, and then creates the remaining contour plots with gsn_csm_contour. The overlay procedure is then used to overlay these remaining contour plots over the existing contour/map plot.

hdf4eos_1.ncl: Read a HDF4-EOS file containing swath data. NCL identifies the swath as MODIS_SWATH_Type_L1B. Create a simple plot of reflectance with coordinates of scanline and pixel.

The eos.hdf that appears as the file name is an alias for MOOD021KM.A2000303.1920.002.2000317044659.hdf. It is not uncommon for a HDF4-EOS file to have the ".hdf" file extension. In this case, NCL will open and read the file sucessfully but it is best to manually append the ".hdfeos" extension when opening the file in the addfile function.

     ncl_filedump eos.hdf
yields
     filename:       eos
path:   eos.hdf
   file global attributes:
      HDFEOSVersion : HDFEOS_V2.6
      StructMetadata_0 : GROUP=SwathStructure
        GROUP=SWATH_1
                SwathName="MODIS_SWATH_Type_L1B"
                [...SNIP...]
        END_GROUP=SWATH_1
END_GROUP=SwathStructure
[...SNIP...]
hdf4eos_2.ncl: Example of a radiance plot. Note that the color table is reversed from example 1.
hdf4eos_3.ncl: A multiple contour plot of other quantities on the MODIS file.
hdf4eos_4.ncl: MODIS data placed on a geographical projection. A rather awkward aspect of this file is that the Latitude and Longitude variables differ in size from the variable being plotted.

The 5 added to the map limits is arbitrary (not required). Here it is used to specify extra space around the plot.
hdf4eos_5.ncl: Illustrates the use of dim_gbits to extract bits. It also demonstrates explicitly labeling different colors with a specific integers.

The use of res@trGridType="TriangularMesh" makes the plotting faster.

Support for HDF5 and HDF5-EOS is being developed for the V5.2.0 release. It is at the alpha testing stage. Some samples follow.
hdf5eos_1.ncl: Read a HDF5-EOS (available v5.2.0) file from the Aura OMI (Ozone Monitoring Instrument) and plot all the variables on the file. Here only two of the variables are shown.
hdf5eos_2.ncl: Read an HDF5-EOS (available v5.2.0) file (OMI) and plot selected variables on the file. (The ncl_filedump utility was used to preview the file's contents,) This example also demonstrates how to retrieve a variable's type prior to reading it into memory. (See getfilevartypes.) It is best to use short2flt or byte2flt if the variable type is "short" or "byte". These functions will automatically apply the proper scaling.

Note that the units for the "EffectiveTemperature" variable appear to be incorrect. They indicate "degrees Celsius" but the range would indicate "degrees Kelvin". This could be addressed by adding the following to the script after the variable has been imported:

     if (vNam(nv).eq."EffectiveTemperature_ColumnAmountO3") then
         x@units = "degrees Kelvin"       ; fix bad units
     end if
hdf5eos_3.ncl: Read an HDF5-EOS (available v5.2.0) file from the MLS (Aura Microwave Limb Sounder) and plot a two-dimensional cross-section (pressure/time) of temperature. Then plot the trajectory of the satellite over this time.
hdf5eos_4.ncl: Read an HDF5-EOS (available v5.2.0) file from the HIRDLS: (a) Use stat_dispersion to print the statistical information for each variable; (b) compute PDFs via pdfx; (c) plot cross-sections; and, (d) plot time series of three different variables.