NCL Home>
Application examples>
Data sets ||
Data files for some examples
Example pages containing:
tidbits |
resources |
functions/procedures
HDF: Hierarchical Data Format
HDF-SDS: Scientific Data Set
HDF-EOS: Earth Observing System
HDF and HDF-EOS
In 1993, NASA chose the Hierarchical Data Format Version 4 (HDF4)
to be the official format for all data products derived by the
Earth Observing System (EOS). It is commonly used for
satellite based data sets. There are several 'models' of
HDF4 which can be a bit confusing. Each model addresses
different needs. One model is the Scientific Data Set (SDS) which is
similar to netCDF-3 (Classic netCDF). It supports multi-dimensional
gridded data with meta-data including support for an unlimited dimension.
To better serve a broader spectrum within the user community
with needs for geolocated data, a new format or convention,
HDF4-EOS was developed.
HDF-EOS supports three geospatial data types: grid,
point, and swath.
Limitations of the HDF4 format, needs for improved data compression and
changing computer paradigms led to the introduction of a
new HDF format (HDF5) in 2008. The HDF4 and HDF5 appellations
might imply some level of compatibility.
Unfortunately, no! These are completely independent formats.
The calling interfaces and underlying storage formats are different.
Unlike the netCDF community which has a well established
history of conventions (eg., Climate and Forecast
convention), there seems to be a lack of commonly accepted conventions by the
HDF satellite community. For example the "units" for geographical coordinates under the
CF convention must be recognized by the
udunits package. Often on HDF files the latitude and longitude
variables will have the units "degrees" while netCDF's CF convention
would require "degrees_north" and "degrees_east" or some other
recognized units.
Some experiments (eg
Aura)
have developed "guidelines" for data. However, perhaps because the spectrum
of experiments is large, the HDF community does not have
a "culture" where broadly accepted
conventions are commonly used.
NCL Comments
NCL
recognizes and supports multiple data formats including
HDF4-SDS, HDF4-EOS and HDF5-EOS (v5.2.0). The following HDF
related file extensions are recognized: "hdf", "hdfeos",
"he2", "he4", and "he5".
The first rule of 'data processing' is to look at the data.
The command line utility
ncl_filedump
can be used to examine a file's contents. Information such as a variable's type,
size, shape can allow users to develop optimal code for processing.
The stat_dispersion and
pdfx functions
can be used to examine a variable's distribution. It is not
uncommon for outliers to be present. If so, it is best to manually specify
the contour limits and spacing to maximize information content
on the plots.
A possible source of confusion is that variables that are "short" or "byte"
can be unpacked to float via two different formulations. If x
represents a variable of type "short" or "byte", the unpacked
or scaled value can be derived via:
value = x*scale_factor + add_offset
or
value = (x + add_offset)*scale_factor
The NCL functions
short2flt/
byte2flt can
be used to unpack the former, while,
short2flt_hdf/
byte2flt_hdf can
be used to unpack the latter.
Data used in the Examples
Regardles of the dataset used, the same principle can be
used to process the data.
HDF-SDS [hdf]
TRMM - Tropical Rainfall Measuring Mission
MODIS - Moderate Resolution Imaging Spectroradiometer
SeaWiFS- Sea-viewing Wide Field-of-view Sensor [ SeaWiFS examples ]
HDF4EOS [he2, he4, hdfeos]
MODIS - Moderate Resolution Imaging Spectroradiometer
HDF5EOS [he5]
HIRDLS - High Resolution Dynamics Limb Sounder
MLS - Microwave Limb Sounder
OMI - Ozone Monitoring Instrument
TES - Tropospheric Emission Spectrometer
hdf4sds_1.ncl:
Read a TRMM
file containing 3-hourly
precipitation at 0.25 degree resolution. The geographical
extent is 40S to 40N. This data is from the
Tropical Rainfall Measuring Mission (TRMM).
Create a packed netCDF
using the pack_values function.
This creates a file half the size of those created
using float values. Some precision is lost but is not important here.
This HDF file is classifed as a "Scientific
Data Set" (HDF4-SDS).
Unfortunately, the file is not 'self-contained'.
because the file contnts do not contain the geographical coordinates
or temporal information The former must be obtained via
a web site while the time is embedded within the file name.
hdf4sds_2.ncl:
Read an
HDF4-SDS file that contains high resolution (1km)
data over India and Sri Lanka. The file does not explicitly contain
any coordinate arrays. However, the variable on the file
"Mapped_Composited_mapped" does have the following attributes:
Slope : 1
Intercept : 0
Scaling : linear
Limit : ( 4, 62, 27, 95 )
Projection_ID : 8
Latitude_Center : 0
Longitude_Center : 78.5
Rotation : 0
The
Slope/Intercept attributes would indicate that no scaling has been
applied to the data. The
Limit attribute indicates the geographical
limits and the
Latitude_Center/Longitude_Center/Rotation
specify map attributes. The variable does not specify any
missing value [_FillValue] attribute but, after looking at the
data, it was noted that the value -1 is appropriate.
The stat_dispersion function was
used to determine the standard and robust estimates of the
variable's dispersion. Outliers are present and the contour
information was manually specified.
hdf4sds_3.ncl: Read multiple files
(here, 131 files) for one particular day; for each file
bin and sum the satellite data using
bin_sum;
after all files have been read, use
bin_avg
to average all the summed values; plot; create a netCDF
of the binned (gridded data).
Note:
Here the data are netCDF files. However, the original files were
HDF-SDS (eg: MYD06_L2.A2005364.1405.005.2006127140531.hdf). The
originating scientist converted these to netCDF for some reason.
NCL can handle either. Only the file extension need be changed
(.nc to .hdf).
hdf4sds_4.ncl:
Read a HDF-SDS dataset containing
MODIS Aqua Level-3 SSTs. The file attributes contain the
geographical information and this is used to generate
coordinate variables. One issue is that the data "l3m_data"
are of type "unsigned short". These are not explicitly supported
through v5.1.1 (but will be in 5.2.0). Hence, a simple 'work-around'
is used.
In addition to plotting the original 9KM data, the
area_hi2lores is used to interpolate
the data to a 0.5x0.5 grid. A netCDF file is created.
Other 'L3' datasets could be directly used in the sample script.
For example: Example 3 on the
SeaWiFS Application page.
hdf4sds_5.ncl:
Read four MODIS HDF datasets and create a series
of swath contours over an Orthographic map. The 2D lat/lon data
is read off of each file and used to determine where on the map
to overlay the contours.
This example uses gsn_csm_contour_map to create the map plot
with the first set of contours, and then creates the remaining contour
plots with gsn_csm_contour. The
overlay procedure is then used to overlay these
remaining contour plots over the existing contour/map plot.
hdf4eos_1.ncl:
Read a
HDF4-EOS file containing
swath data.
NCL identifies the swath as
MODIS_SWATH_Type_L1B.
Create a simple plot of reflectance with coordinates of scanline
and pixel.
The eos.hdf that appears as the file name is an alias for
MOOD021KM.A2000303.1920.002.2000317044659.hdf. It is not
uncommon for a HDF4-EOS file to have the ".hdf" file extension.
In this case, NCL will open and read the file sucessfully but
it is best to manually append the ".hdfeos" extension when opening
the file in the addfile function.
ncl_filedump eos.hdf
yields
filename: eos
path: eos.hdf
file global attributes:
HDFEOSVersion : HDFEOS_V2.6
StructMetadata_0 : GROUP=SwathStructure
GROUP=SWATH_1
SwathName="MODIS_SWATH_Type_L1B"
[...SNIP...]
END_GROUP=SWATH_1
END_GROUP=SwathStructure
[...SNIP...]
hdf4eos_2.ncl:
Example of a radiance plot. Note that the color table is reversed
from example 1.
hdf4eos_3.ncl:
A multiple contour plot of other quantities on the MODIS file.
hdf4eos_4.ncl:
MODIS data placed on a geographical projection.
A rather awkward aspect of this file is that the Latitude and Longitude
variables differ in size from the variable being plotted.
The 5 added to the map limits is arbitrary (not required). Here it is
used to specify
extra space around the plot.
hdf4eos_5.ncl:
Illustrates the use of
dim_gbits
to extract bits. It also demonstrates explicitly
labeling different colors with a specific integers.
The use of
res@trGridType="TriangularMesh" makes the plotting faster.
Support for HDF5 and HDF5-EOS is being developed for
the V5.2.0 release. It is at the alpha testing stage. Some samples follow.
hdf5eos_1.ncl:
Read a
HDF5-EOS (available v5.2.0) file from the Aura
OMI (Ozone Monitoring Instrument) and plot all the variables
on the file. Here only two of the variables are shown.
hdf5eos_2.ncl:
Read an
HDF5-EOS (available v5.2.0)
file (OMI) and plot selected variables
on the file. (The
ncl_filedump
utility was used to preview the file's contents,)
This example also demonstrates how to retrieve a
variable's type prior to reading it into memory. (See
getfilevartypes.)
It is best to use
short2flt or
byte2flt if
the variable type is "short" or "byte".
These functions will automatically apply the proper scaling.
Note that the units for the "EffectiveTemperature"
variable appear to be incorrect. They indicate "degrees Celsius"
but the range would indicate "degrees Kelvin". This could be addressed
by adding the following to the script after the variable has
been imported:
if (vNam(nv).eq."EffectiveTemperature_ColumnAmountO3") then
x@units = "degrees Kelvin" ; fix bad units
end if
hdf5eos_3.ncl:
Read an
HDF5-EOS (available v5.2.0) file from the MLS (Aura Microwave Limb Sounder) and
plot a two-dimensional cross-section (pressure/time) of temperature. Then plot
the trajectory of the satellite over this time.
hdf5eos_4.ncl:
Read an
HDF5-EOS (available v5.2.0) file from
the HIRDLS: (a) Use
stat_dispersion to
print the statistical information for each variable; (b) compute PDFs
via
pdfx;
(c) plot cross-sections; and, (d) plot time series of three different variables.