
regCoef
Calculates the linear regression coefficient between two variables.
Prototype
function regCoef ( x : numeric, y : numeric ) return_val : float or double
Arguments
xAn array of any dimensionality. Missing values should be indicated by x@_FillValue. If x@_FillValue is not set, then the NCL default (appropriate to the type of x) will be assumed.
yAn array of any dimensionality. The last (rightmoast) dimension of y must be the same as the last dimension of x. Missing values should be indicated by y@_FillValue. If y@_FillValue is not set, then the NCL default (appropriate to the type of y) will be assumed.
Return value
If either x or y are of type double, then the return array is returned as double. Otherwise, the returned variable is returned as type float. The dimensionality is a bit more complicated; see the description and examples below.
Description
regCoef computes the linear regression coefficient via least-squares. regCoef is designed to work with multi-dimensional x and y arrays. If the regression information for a single best fit line for 1-dimensional x and y data is desired, then regline is the appropriate choice. Missing data are allowed.
The regCoef function returns the following attributes: yintercept (y-intercept), tval (t-statistic), rstd (standard error of the estimated regression coefficient) and nptxy (number of elements used). These will be returned as scalars for one-dimensional x/y; otherwise, as one-dimensional attributes of the return variable (call it rc). The type of tval and rstd will be the same as rc while nptxy will be of type integer. These attributes may be used for statistical testing (see examples).
The dimensions of rc are illustrated as follows:
x(N), y(N) rc, tval, mptxy are scalars x(N), y(K,M,N) rc, tval, mptxy are arrays of size (K,M) x(I,N), y(K,M,N) rc, tval, mptxy are arrays of size (I,K,M) x(J,I,N), y(L,K,M,N) rc, tval, mptxy are arrays of size (J,I,L,K,M)There's a special case when all dimensions of x and y are identical:
x(J,I,N), y(J,I,N) rc, tval, mptxy are arrays of size (J,I)Note on the units of the returned regression coefficient(s): if x has units of, say, degrees Kelvin (K), and y has units of, say, meters (M), then the units of the regression coefficient are M/K. The function does not standardize x (or y) prior to calculating the regression coefficient. If this is desired, then it is the user's responsibility do so. The NCL function dim_standardize can be used.
Use regCoef_n if the dimensions to do the calculation on is not the rightmost dimension and reordering is not desired. This function can be significantly faster than regCoef.
See Also
Examples
In the code snippets below, there are some examples of both regcoef and regCoef, so you can see how they both can be utilized. The choice of procedure or function versions is strictly a matter of personal choice.
Example 1
In the example below, the regression coefficient for the case with no missing data is 0.97 with standard error (rc@rstd) of 0.025. There are 16 degrees of freedom (=nptxy-2). A test of the null hypothesis (i.e., that the regression coefficient is zero) yields a t-statistic (=tval) of 38.7, which is distributed as t(16). Obviously, the null hypothesis (ie: rc=0.0) would be rejected. [data source: Statistical Theory and Methodology, K.A. Brownlee, 1965, John Wiley & Sons, pp 342-345.]
The betainc function can be used to evaluate a t-statistic distributed appoximately as a Student-t distribution. [See: Numerical Recipes: Fortran, Cambridge Univ. Press, 1986.]
begin x = (/ 1190.,1455.,1550.,1730.,1745.,1770. \ , 1900.,1920.,1960.,2295.,2335.,2490. \ , 2720.,2710.,2530.,2900.,2760.,3010. /) y = (/ 1115.,1425.,1515.,1795.,1715.,1710. \ , 1830.,1920.,1970.,2300.,2280.,2520. \ , 2630.,2740.,2390.,2800.,2630.,2970. /) rc = regCoef(x,y) print(rc)The (edited) output is:
Variable: rc Type: float Total Size: 4 bytes 1 values Number of Dimensions: 1 Dimensions and sizes: [1] Coordinates: Number Of Attributes: 5 _FillValue : 9.96921e+36 nptxy : 18 rstd : 0.02515461 yintercept : 15.35228 tval : 38.74286 (0) 0.9745615Now use the information to calculate the probability.
; for clarity only, explicitly assign to a new variable df = rc@nptxy-2 ; degrees of freedom tval = rc@tval ; t-statistic prob = betainc(df/(df+tval^2),df/2.0,0.5) print ("NO MSG DATA: rc="+rc+" tval="+tval \ +" nptxy="+nptxy+" prob="+prob)For illustration, add some missing values.
x(2) = x@_FillValue ; arbitrarily set msg values y(8) = y@_FillValue rc = regCoef(x,y) df = rc@nptxy-2 ; degrees of freedom tval = rc@tval ; t-statistic b = 0.5 ; b must be same size as tval (and df) prob = betainc(df/(df+tval^2),df/2.0,b) print ("MSG DATA: rc="+rc+" tval="+tval \ +" nptxy="+nptxy+" prob="+prob) end Note 1: Some users prefer to express probability as prob = (1 - betainc(df/(df+tval^2),df/2.0,b) ) Note 2: To construct 95% confidence limits for the regression coefficient: [a] The t for 0.975 and 16 degrees of freedom is 2.120 [table look-up]. [b] rc@rstd * 2.12 = 0.053. This yields 95% confidence limits of (0.97-0.053) < 0.97 < (0.97+0.053) or (0.92 to 1.03).Example 2
Assume x is a one dimensional array (1D) array of size ntim and type float. Assume y is a three-dimensional array (3D) array of size nlat x nlon x ntim.
Attributes returned by NCL functions must be one-dimensional arrays. To transform to a multidimensional array use onedtond and dimsizes.
rc = regCoef(x,y) ; rc(nlat,nlon) ; rc@tval and rc@nptxy will be 1D arrays of ; size nlat*nlon printVarSummary(rc) ; variable overview tval = onedtond(rc@tval , dimsizes(rc)) df = onedtond(rc@nptxy, dimsizes(rc)) - 2 b = tval ; b must be same size as tval (and df) b = 0.5 prob = betainc(df/(df+tval^2),df/2.0,b) ; prob(nlat,nlon)Obviously, user defined attributes may be assigned any time after a variable's creation. For example:
rc@long_name = "regression coefficient" prob@long_name = "probability"
Example 3
Same as Example 2 but y has named dimensions, "time", "lat" and "lon" with the following ordering y(time,lat,lon). Here, use NCL's dimension reordering to make "time" the rightmost dimension.
rc = regCoef(x, y(lat|:,lon|:,time|:) )
Note: with NCL V6.2.1 or later, you can use regCoef_n to avoid having to reorder the arrays first:
rc = regCoef_n(x, y, 0, 0)If y has coordinate variables these may readily be assigned via NCL syntax:
rc!0 = "lat" ; name dimensions rc!1 = "lon" rc&lat = y&lat ; assign coordinate values to named dimensions rc&lon = y&lon