regline_stats
Performs simple linear regression including confidence estimates, an ANOVA table and 95% mean response estimates.
Available in version 6.2.0 and later.
Prototype
load "$NCARG_ROOT/lib/ncarg/nclscripts/csm/contributed.ncl" ; This library is automatically loaded
; from NCL V6.2.0 onward.
; No need for user to explicitly load.
function regline_stats (
x [*] : numeric,
y [*] : numeric
)
return_val [1] : float or double with many attributes
Arguments
xA one dimensional array of size N containing the independent variable. Missing values, indicated by _FillValue, are allowed but are ignored.
yA one-dimensional array of length N containing the corresponding dependent variable. Missing values, indicated by _FillValue, are allowed but are ignored.
Return value
A scalar containing the calculated linear regression coefficient. In addition, many statistical quantities are returned as attributes of b. The 95% limits were introduced in version 6.4.0
The return value will be of type double if x or y is double, and float otherwise.
Description
regline_stats performs a simple linear regression. The calculated regression coefficient represents the rate of change in the dependent variable for a unit change in the independent variable. This coefficient is sometimes referred to as the slope or trend.
The user may wish to take into account the uncertainty associated with each estimated parameter and the overall uncertainty. This information is returned as the attributes stderr, tval and pval. In addition, an ANalysis Of VAriance [ANOVA] table is returned.
Any missing values are ignored.
Missing values should be indicated by the _FillValue attribute.
Chad Hermann created a function linreg which derived many of the same statistics. It is written entirely in NCL and is here.
Debasish Mazumder (NCAR/RAL) submitted code for the 95% mean response about the regression line. This was included within the regline_stats in the 6.4.0 release of NCL. A reference for this is Chapter 11 of
Applied Statistics and Probability for Engineers
D. C. Montgomery and G. C. Runge
ISBN: 0471204544
From the WWW:
- the confidence intervals tells you about the likely location of the true population parameter
- the prediction intervals must account for both the uncertainty in knowing the value of the population mean, plus data scatter. So a prediction interval is always wider than a confidence interval.
GRAPHICS NOTE: For the computations, the x and y pairs can be in any order. However, subsequent use for plotting lines often requires that the x and the associated y be in monotonically {in/de}creasing order. See Example 2 below.
See Also
regline, reg_multlin, reg_multlin_stats
Examples
See examples regress_1a.ncl and regress_1b.ncl for plot and usage of returned information. Example 1
The following example was taken from:
Brownlee
Statistical Theory and Methodology
J Wiley 1965 pgs: 342-346 QA276 .B77
See example regress_1a.ncl for plot and usage of returned information.
x = (/ 1190.,1455.,1550.,1730.,1745.,1770. \
, 1900.,1920.,1960.,2295.,2335.,2490. \
, 2720.,2710.,2530.,2900.,2760.,3010. /)
y = (/ 1115.,1425.,1515.,1795.,1715.,1710. \
, 1830.,1920.,1970.,2300.,2280.,2520. \
, 2630.,2740.,2390.,2800.,2630.,2970. /)
rc = regline_stats(x,y) ; linear regression coef
print(rc)
The print(rc) yielded the following (edited) output.
All the comments were manually added.
Variable: rc
Type: float
Total Size: 4 bytes
1 values
Number of Dimensions: 1
Dimensions and sizes: [1]
Coordinates:
Number Of Attributes: 38
_FillValue : 9.96921e+36
long_name : simple linear regression
model : Yest = b(0) + b(1)*X
N : 18 ; # of observations
NP : 1 ; # of predictor variables
M : 2 ; # of returned coefficients
bstd : ( 0, 0.9947125 ) ; standardized regression coefficient
; [ignore 1st element]
; ANOVA information: SS=>Sum of Squares
SST : 4823324 ; Total SS: sum((Y-Yavg)^2)
SSE : 50871.91 ; Residual SS: sum((Yest-Y)^2)
SSR : 4772452 ; sum((Yest-Yavg)^2)
MST : 283724.9
MSE : 3179.494
MSE_dof : 16
MSR : 4772452
RSE : 56.387 ; residual standard error; sqrt(MSE)
RSE_dof : 15
F : 1501.01 ; MSR/MSE
F_dof : ( 1, 16 )
F_pval : 1.51066e-17
r2 : 0.989453 ; square of the Pearson correlation coefficient
r : 0.9947125 ; multiple (overall) correlation: sqrt(r2)
r2a : 0.9887938 ; adjusted r2... better for small N
fuv : 0.01054698 ; (1-r2): fraction of variance of the regressand
; (dependent variable) Y which cannot be explained
; i.e., which is not correctly predicted, by the
; explanatory variables X.
Yest : [ARRAY of 18 elements] ; Yest = b(0) + b(1)*x
Yavg : 2125.278 ; avg(y)
Ystd : 532.6583 ; stddev(y)
Xavg : 2165 ; avg(x)
Xstd : 543.6721 ; stddev(x)
stderr : ( 56.05802, 0.02515461 ) ; std. error of each b
tval : ( 0.2738642, 38.74285 ) ; t-value
pval : ( 0.7874895, 5.017699e-18 ) ; p-value
; following added in version 6.4.0
b95 : ( 0.9212361, 1.027887 ) ; 2.5% and 97.5% regression coef. confidence intervals
y95 : ( -459.9983, 490.703 ) ; 2.5% and 97.5% y-intercept confidence intervals
YMR025 : [ARRAY of 18 elements] ; 2.5% mean response
YMR975 : [ARRAY of 18 elements] ; 97.5% mean response
YPI025 : [ARRAY of 18 elements] ; 2.5% predition interval
YPI975 : [ARRAY of 18 elements] ; 97.5% predition interval
; Original regline attributes
; provided for backward cmpatibility
nptxy : 18
xave : 2165
yave : 2125.278
rstd : 56.387
yintercept : 15.35228
b : ( 15.35228, 0.9745615 ) ; b(0) + b(1)*x
(0) 0.9745615 ; regression coefficient
Example 2
The following example was taken from: http://www.stat.ucla.edu/~hqxu/stat105/pdf/ch11.pdf. It is a commonly used WWW example. AS noted above, regline_stats does not require any special ordering. However for subs Here the independent values (x) and associated dependent variables (y) are placed into ascending order for subsequent plotting.
See example regress_1b.ncl for plot and usage of returned information.
; Hydrocarbon level (%)
x = (/ 0.99, 1.02, 1.15, 1.29, 1.46, 1.36, 0.87, 1.23, 1.55, 1.40 \
, 1.19, 1.15, 0.98, 1.01, 1.11, 1.20, 1.26, 1.32, 1.43, 0.95 /)
; Purity (%)
y = (/ 90.01, 89.05, 91.43, 93.74, 96.73, 94.45, 87.59, 91.77, 99.42, 93.65 \
, 93.54, 92.52, 90.56, 89.54, 89.85, 90.39, 93.25, 93.41, 94.98, 87.33 /)
; perform regression on the variables
rc = regline_stats(xx,yy) ; linear regression coef
print(rc)
; For graphical purposes, it is recommended that the independent variable be
; monotonically {in/de}creasing
ii = dim_pqsort_n(x,1,0)
xx = x(ii)
yy = y(ii)
...
plot = gsn_csm_xy(wks,xx,yy, res)
yielded
Variable: rc
Type: float
Total Size: 4 bytes
1 values
Number of Dimensions: 1
Dimensions and sizes: [1]
Coordinates:
Number Of Attributes: 41
_FillValue : 9.96921e+36
long_name : simple linear regression
model : Yest = b(0) + b(1)*X
N : 20
NP : 1
M : 2
bstd : ( 0, 0.9367155 )
SST : 173.3769
SSE : 21.24981
SSR : 152.1272
MST : 9.125101
MSE : 1.180545
MSE_dof : 18
MSR : 152.1272
RSE : 1.086529
RSE_dof : 17
F : 128.8618
F_dof : ( 1, 18 )
F_pval : 1.227305e-09
r2 : 0.8774363
r : 0.9367157
r2a : 0.8706272
fuv : 0.1225637
Yest : [ARRAY of 20 elements]
Yavg : 92.1605
Ystd : 3.020778
Xavg : 1.196
Xstd : 0.1893034
stderr : ( 1.593479, 1.316763 )
tval : ( 46.61706, 11.35169 )
pval : ( 4.660713e-21, 6.580665e-10 )
nptxy : 20
xave : 1.196
yave : 92.1605
rstd : 1.086529
yintercept : 74.28331
b : ( 74.28331, 14.94748 )
b95 : ( 12.18108, 17.71389 )
y95 : ( 60.07995, 88.48667 )
YMR025 : [ARRAY of 20 elements] ; 2.5% mean response
YMR975 : [ARRAY of 20 elements] ; 97.5% mean response
YPI025 : [ARRAY of 20 elements] ; 2.5% predition interval
YPI975 : [ARRAY of 20 elements] ; 97.5% predition interval
(0) 14.94748
Example 3Use the 'Old Faithful' data set used by several R regression examples Here, we use NCL's regline and the ANOVA version regline_stats.
See regress_8.ncl for a plot of the data and the regression line.
Simplefied R code:
eruption.lm = lm(eruptions ~ waiting, data=faithful)
summary(eruption.lm)
Summary output:
Call:
lm(formula = eruptions ~ waiting, data = faithful)
Residuals:
Min 1Q Median 3Q Max
-1.2992 -0.3769 0.0351 0.3491 1.1933
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -1.87402 0.16014 -11.7 <2e-16 ***
waiting 0.07563 0.00222 34.1 <2e-16 ***
---
Signif. codes: 0 *** 0.001 ** 0.01 * 0.05
Residual standard error: 0.497 on 270 degrees of freedom
Multiple R-squared: 0.811, Adjusted R-squared: 0.811
The NCL code is:
diri = "./" fili = "faithful_R.dat" pthi = diri + fili ncol = 3 data = readAsciiTable(pthi, ncol, "float", 1) x = data(:,2) ; eruption interval (minutes); independent y = data(:,1) ; eruption duration (minutes); dependent nxy = dimsizes(y) x@long_name = "Eruption Interval (min)" y@long_name = "Eruption Duration (min)" ;print("============ OLD Faithful================") ;print(sprintf("%4.0f", x)+" "+sprintf("%4.0f", y)) ;print("=========================================") ;---Assorted statistics and regression line fit ;rxy = escorc(x,y) ; cross-correlation ;print(rxy) ;print("=========================================") rcl = regline(x, y) ; basic linear regression print(rcl) df = rcl@nptxy-2 prob = (1 - betainc(df/(df+rcl@tval^2), df/2.0, 0.5) ) print("=========================================") rcl_anova = regline_stats(x,y) ; linear regression: ANOVA print(rcl_anova) print("=========================================")The (slightly) edited output is:
Variable: rcl
Type: float
Total Size: 4 bytes
1 values
Number of Dimensions: 1
Dimensions and sizes: [1]
Coordinates:
Number Of Attributes: 7
_FillValue : 9.96921e+36
yintercept : -1.874016
yave : 3.487783
xave : 70.89706
nptxy : 272
rstd : 0.002218541
tval : 34.08904
(0) 0.07562795
(0) =========================================
Variable: rcl_anova
Type: float
Total Size: 4 bytes
1 values
Number of Dimensions: 1
Dimensions and sizes: [1]
Coordinates:
Number Of Attributes: 44
_FillValue : 9.96921e+36
long_name : simple linear regression
model : Yest = b(0) + b(1)*X
N : 272
NP : 1
M : 2
bstd : ( 0, 0.9008112 )
SST : 353.0395
SSE : 66.56177
SSR : 286.4776
MST : 1.302729
MSE : 0.2465251
MSE_dof : 270
MSR : 286.4776
RSE : 0.4965129
RSE_dof : 269
F : 1162.063
F_dof : ( 1, 270 )
F_pval : 0
r2 : 0.8114605
r : 0.900811
r2a : 0.8107622
fuv : 0.1885395
Yest :
Yavg : 3.487783
Ystd : 1.141371
Xavg : 70.89706
Xstd : 13.59497
stderr : ( 0.1601433, 0.002218541 )
tval : ( -11.70212, 34.08904 )
pval : ( 7.089058e-26, 0 )
nptxy : 272
xave : 70.89706
yave : 3.487783
rstd : 0.4965129
yintercept : -1.874016
b : ( -1.874016, 0.07562795 )
tnull : 34.08905
b95 : ( 0.07126011, 0.07999578 )
y95 : ( -2.189304, -1.558728 )
YMR025 :
YMR975 :
YPI025 :
YPI975 :
(0) 0.07562795