NCL Home> Application examples> Data Analysis || Data files for some examples

Example pages containing: tips | resources | functions/procedures

Probability Distribution Functions

The probability distribution (frequency of occurrence) of an individual variable, X, may be obtained via the pdfx function. Given two variables X and Y, the bivariate joint probability distribution returned by the pdfxy function indicates the probability of occurrence defined in terms of both X and Y.

Generally, the larger the array(s) the smoother the derived PDF. Bin sizes of less-than [greater-than] the default number of 25 bins will result in smoother [rougher] plots.

pdf_1.ncl: Using the pdfx function, this example illustrates univariate PDFs from three variables with three different distributions. Default settings of parameters are used (eg., 25 bins).
pdf_2.ncl: This illustrates using a user specified number of bins. Here, 40 bins are specified. This results in a more ragged view of the distribution. Use of the returned bin_center attributes from three PDFs to place all on a common x-axis is illustrated. (Minor changes would be required if the number of bins used had been different.) The gsnXYBarChart and gsnXYBarChartOutlineOnly illustrate using a bar style plot.
pdf_3.ncl: Using the pdfxy function, illustrate a simple bivariate PDF using two variables having normal distributions.
pdf_4.ncl: Similar to Example 3 but use different bin numbers. Given a fixed number of values, the fewer bins used, the smoother the resulting PDF.
pdf_5.ncl: The bivariate distributions of variables from variables with different univariate distributions will yield different patterns. Here, the univariate distributions of Example 1 are used to create bivariate PDFs.

Some tuning of plots may be necessary to focus on regions of interest. Here, the "Gamma/Chi" distributions are highly skewed. There are large areas where the joint probabilites are near or at zero. NCL coordinate subscripting is used to select regions of interest.

pdf_6.ncl: Variables that may not be continuous [probabilities=0.0] may be best viewed via use of "raster" plots. These clearly show the bin and data resolution.

Note that using gsn_csm_contour results in the raster bins at the edges being reduced to half width. The use of plt_pdfxy located in the shea_util expands the contour area and allows the edge raster bins to be fully viewed.