NCL Home > Documentation > Functions > General applied math, Statistics

regline

Calculates the linear regression coefficient between two series.

Prototype

	function regline (
		x [*] : numeric,  
		y [*] : numeric   
	)

	return_val [1] :  float or double

Arguments

x
y

One-dimensional arrays of the same length. Missing values should be indicated by x@_FillValue and y@_FillValue. If x@_FillValue or y@_FillValue are not set, then the NCL default (appropriate to the type of x and y) will be assumed.

Return value

The return value is a scalar of type double if either x or y are double, and float otherwise. Some attributes are returned as well. See the description below.

Description

regline computes the information needed to construct a regression line: regression coefficient (trend, slope,...) and the average of the x and y values. regline is designed to work with one-dimensional x and y arrays. Missing data are allowed.

regline also returns the following attributes:

xave (scalar, float or double, depending on x and y)
average of x
yave (scalar, float or double, depending on x and y)
average of y
tval (scalar, float or double, depending on x and y)
t-statistic (assuming null-hypothesis)
rstd (scalar, float or double, depending on x and y)
standard error of the regression coefficient
yintercept (scalar, float or double, depending on x and y)
y-intercept at x=0
nptxy (scalar, integer)
number of points used

If the regression coefficients for multi-dimensional arrays are needed, use regCoef.

If the series is autocorrelated the returned degrees of freedom must be recalculated. See Example 2.

          Taking Serial Correlation into Account in Tests of the Mean
          F. W. Zwiers and H. von Storch, J. Climate 1995, pp336- 

See Also

regline_stats, regCoef, regCoef_n, reg_multlin_stats, equiv_sample_size, rtest

Examples

Example 1

The following example was taken from:

    Brownlee
    Statistical Theory and Methodology
    J Wiley 1965   pgs: 342-346     QA276  .B77
The regression line information for the example below is: (a) rc=0.9746, (b) tval=38.7, (c) nptxy=18 which yields 16 degrees of freedom (df=nptxy-2). To test the null hypothesis (i.e., rc=0) at the two-tailed 95% level, we note that t(16) is 2.120 (table look-up: 0.975). Clearly, the calculated t-statistic greatly exceeds 2.120 so the null hypothesis is rejected at the 5% level.

Rather than a table lookup, the following could be used to calculate the actual significance level.

       alpha = betainc(df/(df+rc@tval^2), df/2.0, 0.5)
or, alternatively,
       prob = 1 - betainc(df/(df+rc@tval^2), df/2.0, 0.5)
The example series are:
      x    = (/ 1190.,1455.,1550.,1730.,1745.,1770. \
             ,  1900.,1920.,1960.,2295.,2335.,2490. \
             ,  2720.,2710.,2530.,2900.,2760.,3010. /)
      
      y    = (/ 1115.,1425.,1515.,1795.,1715.,1710. \
             ,  1830.,1920.,1970.,2300.,2280.,2520. \
             ,  2630.,2740.,2390.,2800.,2630.,2970. /)
      
      rc   = regline (x,y)
     
                                    ; Note use of attributes
      df   = rc@nptxy-2
      prob = (1 - betainc(df/(df+rc@tval^2), df/2.0, 0.5) )
     
      yReg = rc*x + rc@yintercept   ; NCL array notation 
                                    ; yReg is same length as x, y
      print(rc)
     
      print("prob="+prob)
     
      print(yReg)
The "print(rc)" statement will yield:
Variable: rc
Type: float
Total Size: 4 bytes
            1 values
Number of Dimensions: 1
Dimensions and sizes:   [1]
Coordinates: 
Number Of Attributes: 7
  _FillValue :  -999
  yintercept :  15.35229
  yave       :  2125.278
  xave       :  2165
  nptxy      :  18
  rstd       :  0.025155
  tval       :  38.74286

(0)     0.9745615

(0)     prob=1

The "print(yReg)" statement will yield:

Variable: yReg
Type: float
Total Size: 72 bytes
            18 values
Number of Dimensions: 1
Dimensions and sizes:   [18]
Coordinates: 
Number Of Attributes: 1
  _FillValue :  -999
(0)     1175.08
(1)     1433.339
(2)     1525.923
(3)     1701.344
(4)     1715.962
(5)     1740.326
(6)     1867.019
(7)     1886.51
(8)     1925.493
(9)     2251.971
(10)    2290.953
(11)    2442.01
(12)    2666.159
(13)    2656.414
(14)    2480.993
(15)    2841.581
(16)    2705.142
(17)    2948.782

Note 1: The above assumes that all the points are independent. If this is not the case, then the number used to test for significance should be less.

Note 2: To construct 95% confidence limits for the hypothesis that the regression coefficient is one (i.e., rc=1) :

  • As noted above, the t for 0.975 and 16 degrees of freedom is 2.120 [table look-up].
  • rc@rstd * 2.12 = 0.053. This yields 95% confidence limits of (0.97-0.053) < 0.97 < (0.97+0.053) or (0.92 to 1.03). Thus, the hypothesis that rc=1 can not be rejected.

Example 2

Often geophysical time series are correlated. If the auto-correlation is significant [specified, a priori, by the user], then the degrees of freedom returned should be adjusted. The adjusted dof should be used in the calculation of the regression coefficient significance. Example 1 used:

 
      rc   = regline (x,y)
      df   = rc@nptxy-2
      prob = (1 - betainc(df/(df+rc@tval^2), df/2.0, 0.5) )
A-priori, let's set the lag-1 autocorrelation significance level at (say) 0.10.
        N    = rc@nptxy                         ; convenience/clarity
        acr  = esacr(y,2)
        if (acr(1).gt.0.0) then
            pr1     = rtest(acr(1), N, 0)
            rsiglvl = 0.10
            if (pr1.lt.rsiglvl) then
                df   = N*(1.0-acr(1))/(1.0+acr(1))
                prob = (1 - betainc(df/(df+rc@tval^2), df/2.0, 0.5) )
            end if
        end if

Example 3

Use the 'Old Faithful' data set used by several R regression examples Here, we use NCL's regline and the ANOVA version regline_stats.

Simplefied R code:

    eruption.lm = lm(eruptions ~ waiting, data=faithful)
    summary(eruption.lm) 

Summary output:

    Call: 
    lm(formula = eruptions ~ waiting, data = faithful) 
     
    Residuals: 
        Min      1Q  Median      3Q     Max 
    -1.2992 -0.3769  0.0351  0.3491  1.1933 
     
    Coefficients: 
                Estimate Std. Error t value Pr(>|t|) 
    (Intercept) -1.87402    0.16014   -11.7   <2e-16 *** 
    waiting      0.07563    0.00222    34.1   <2e-16 *** 
    --- 
    Signif. codes:  0 *** 0.001 ** 0.01 * 0.05 
     
    Residual standard error: 0.497 on 270 degrees of freedom 
    Multiple R-squared: 0.811,      Adjusted R-squared: 0.811 
     

The NCL code is:

   diri = "./"
   fili = "faithful_R.dat"
   pthi = diri + fili

   ncol = 3
   data = readAsciiTable(pthi, ncol, "float", 1)

   x    = data(:,2)   ; eruption interval (minutes); independent
   y    = data(:,1)   ; eruption duration (minutes); dependent
   nxy  = dimsizes(y)

   x@long_name = "Eruption Interval (min)"
   y@long_name = "Eruption Duration (min)"

  ;print("============ OLD Faithful================")
  ;print(sprintf("%4.0f", x)+"  "+sprintf("%4.0f", y))
  ;print("=========================================")

;---Assorted statistics and regression line fit
  ;rxy = escorc(x,y)         ;  cross-correlation
  ;print(rxy)
  ;print("=========================================")

   rcl  = regline(x, y)      ; basic linear regression
   print(rcl)

   df   = rcl@nptxy-2
   prob = (1 - betainc(df/(df+rcl@tval^2), df/2.0, 0.5) )
   print("=========================================")

   rcl_anova =  regline_stats(x,y) ; linear regression: ANOVA
   print(rcl_anova)
   print("=========================================")
The (slightly) edited output is:

   Variable: rcl
   Type: float
   Total Size: 4 bytes
               1 values
   Number of Dimensions: 1
   Dimensions and sizes:	[1]
   Coordinates: 
   Number Of Attributes: 7
     _FillValue :	9.96921e+36
     yintercept :	-1.874016
     yave :	3.487783
     xave :	70.89706
     nptxy :	272
     rstd :	0.002218541
     tval :	34.08904
   (0)	0.07562795
   (0)	=========================================
   
   
   Variable: rcl_anova
   Type: float
   Total Size: 4 bytes
               1 values
   Number of Dimensions: 1
   Dimensions and sizes:	[1]
   Coordinates: 
   Number Of Attributes: 44
     _FillValue :	9.96921e+36
     long_name :	simple linear regression
     model :	Yest = b(0) + b(1)*X

     N :	272
     NP :	1
     M :	2
     bstd :	(  0, 0.9008112 )
     SST :	353.0395
     SSE :	66.56177
     SSR :	286.4776
     MST :	1.302729
     MSE :	0.2465251
     MSE_dof :	270
     MSR :	286.4776
     RSE :	0.4965129
     RSE_dof :	269
     F :	1162.063
     F_dof :	( 1, 270 )
     F_pval :	 0
     r2 :	0.8114605
     r :	0.900811
     r2a :	0.8107622
     fuv :	0.1885395
     Yest :	
     Yavg :	3.487783
     Ystd :	1.141371
     Xavg :	70.89706
     Xstd :	13.59497
     stderr :	( 0.1601433, 0.002218541 )
     tval :	( -11.70212, 34.08904 )
     pval :	( 7.089058e-26,  0 )
     nptxy :	272
     xave :	70.89706
     yave :	3.487783
     rstd :	0.4965129
     yintercept :	-1.874016
     b :	( -1.874016, 0.07562795 )
     tnull :	34.08905
     b95 :	( 0.07126011, 0.07999578 )
     y95 :	( -2.189304, -1.558728 )
     YMR025 :	
     YMR975 :	
     YPI025 :	
     YPI975 :	
   (0)	0.07562795