 NCL Home > Documentation > Functions > General applied math

# reg_multlin_stats

Performs multiple linear regression analysis including confidence estimates and creates an ANOVA table. Available in version 6.2.0 and later.

## Prototype

```load "\$NCARG_ROOT/lib/ncarg/nclscripts/csm/contributed.ncl"  ; This library is automatically loaded
; from NCL V6.2.0 onward.
; No need for user to explicitly load.

function reg_multlin_stats (
y [*] : numeric,
x     : numeric,             [*] or [*][*] only,
opt   : logical
)

return_val [*] :  float or double
```

## Arguments

y

A one-dimensional array of length N containing the dependent variable [y(N)].

x

A one dimensional array of size N or a two-dimensional array of size [(N,M)] where M is the number of independent variables. See the Description section below for more comments.

opt

A logical variable. If opt=True, then the following attributes can be set:

```
opt = True
opt@print_data  = True     ; will print the input data in table form
opt@print_anova = True     ; will print the ANOVA table
```

## Return value

The one-dimensional array, say b, returned is size (M+1). This will contain the y-intercept and the partial regression coefficients associated with each independent variable. In addition, many statistical quantities are returned as attributes of b.

The return values will be of type double if x or y is double, and float otherwise.

## Description

reg_multlin_stats performs a multiple linear regression. A one dimensional array (call it b(M+1)), containing the y-intercept and the partial regression coefficients is returned. The coefficients represent the rate of change in the dependent variable for a unit change in the independent variable, under the constraint that all other independent variables are held constant.

The input argument x for the function reg_multlin_stats differs from that input to the builtin function reg_multlin. The builtin function requires the user to create a 'design matrix' which consists of a 'dummy' column of 1s and the independent variables. This is a bit cumbersome. This function requires only the independent variables be input. The function creates the required 'design matrix' internally.

The partial regression coefficients, b, can be used to calculate standardized regression coefficients, say B. The B represent the partial regression coefficients in units of standard deviation. The following illustrates the simple calculation needed make the transformation:

```    B(j) = b(j)*standard_deviation(X(j))/standard_deviation(Y)
```
Since the B are expressed in units of standard deviation they may be directly compared with each other to determine the most effective variables.

The user may wish to take into account the uncertainty associated with each estimated parameter and the overall uncertainty. This information is returned as the attributes stderr, tval and pval. In addition, an ANalysis Of VAriance [ANOVA] table is returned in the form of attributes (see example).

While missing values are allowed, it is recommended that users not input any missing independent values. It just confuses the results. Missing values should be indicated by the _FillValue attribute.

## Examples

Example 1

The following is from http://reliawiki.org/index.php/Multiple_Linear_Regression_Analysis

```

y    = (/251.3, 251.3, 248.3, 267.5, 273.0, 276.5, 270.3, 274.9, 285.0 \
, 290.0, 297.0, 302.5, 304.5, 309.3, 321.7, 330.7, 349.0 /)

x1   = (/ 41.9,  43.4,  43.9,  44.5,  47.3,  47.5,  47.9,  50.2,  52.8 \
, 53.2,  56.7,  57.0,  63.5,  65.3,  71.1,  77.0,  77.8 /)

x2   = (/ 29.1,  29.3,  29.5,  29.7,  29.9,  30.3,  30.5,  30.7,  30.8 \
, 30.9,  31.5,  31.7,  31.9,  32.0,  32.1,  32.5,  32.9 /)
nrow = dimsizes(y)

np   = 2
xp   = new( (/nrow,np/), typeof(x1))  ; create an array to hold predictor variables
xp(:,0) = x1
xp(:,1) = x2

opt = True
opt@print_anova = True     ; optional
opt@print_data  = True
b   =  reg_multlin_stats(y,xp,opt) ; partial regression coef
print(b)

```
Partial regression coefficients (b) and additional statistics are returned.

Setting print_data=True resulted in the input data being printed as a table.

```----- reg_multlin_stats: Y, XP -----

251.30      41.90      29.10
251.30      43.40      29.30
248.30      43.90      29.50
267.50      44.50      29.70
273.00      47.30      29.90
276.50      47.50      30.30
270.30      47.90      30.50
274.90      50.20      30.70
285.00      52.80      30.80
290.00      53.20      30.90
297.00      56.70      31.50
302.50      57.00      31.70
304.50      63.50      31.90
309.30      65.30      32.00
321.70      71.10      32.10
330.70      77.00      32.50
349.00      77.80      32.90

```
Setting print_anova=True resulted in the ANOVA information being printed:
```
------- ANOVA information--------
SST=13239.7  SSR=12816.4  SSE=423.372
MST=827.483  MSR=6408.18  MSE=30.2409 RSE=5.49917
F-statistic=211.904 dof=(2,13)
-------
```
The print(b) yielded the following (edited) output. All the comments were manually added.
```
Variable: b
Type: float
Total Size: 12 bytes
3 values
Number of Dimensions: 1
Dimensions and sizes:   
Coordinates:
Number Of Attributes: 27
_FillValue :  9.96921e+36
long_name :   multiple regression coefficients
model :       Yest = b(0) + b(1)*X1 + b(2)*X2 + ...+b(M)*XM

N :   17                              ; # of observations
NP:   2                               ; # of predictor variables
M :   3                               ; # of returned coefficients
bstd : (  0, 0.5029772, 0.4922954 )   ; standardized partial regression coefficients
; [ignore 1st element]

; ANOVA information: SS=>Sum of Squares
SST : 13239.72                        ; Total SS:      sum((Y-Yavg)^2)
SSE : 423.3723                        ; Residual SS:   sum((Yest-Y)^2)
SSR : 12816.36                        ; sum((Yest-Yavg)^2)
MST : 827.4825
MSE : 30.24088
MSE_dof :     14
MSR : 6408.178

RSE : 5.499171                        ; residual standard error; sqrt(MSE)
RSE_dof :     13

F     :   211.9045                    ; MSR/MSE
F_dof :   ( 2, 14 )
F_pval:  3.419033e-11

r2  :  0.9680232                      ; square of the Pearson correlation coefficient
r   :  0.9838817                      ; multiple (overall) correlation: sqrt(r2)
r2a :  0.9634551                      ; adjusted r2... better for small N

fuv : 0.03197682                      ; (1-r2): fraction of variance of the regressand
; (dependent variable) Y which cannot be explained,
;  i.e., which is not correctly predicted, by the
; explanatory variables X.

Yest : [ARRAY of 17 elements]         ; Yest = b(0) + b(1)*X1 + b(2)*X2

stderr :      ( 100.8641, 0.3945341, 3.931675 )       ; std. error of each b
tval :        ( -1.521964, 3.139713, 3.073079 )       ; t-value
pval :        ( 0.1502808, 0.007238129, 0.008262509 ) ; p-value

(0)  -153.5115                          ; partial regression coef: b
(1)     1.238724
(2)    12.08235
```
These results match R exactly:
```
dim(df)
 17  3

names(df)
 "Y"  "X1" "X2"

mlr <- lm(Y~., data=df)
summary(mlr)

Call:
lm(formula = Y ~ ., data = df)

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -153.5117   100.8799  -1.522  0.15034
X1             1.2387     0.3946   3.139  0.00724 **
X2            12.0824     3.9323   3.073  0.00827 **
---
Signif. codes:  0 ***, 0.001 **, 0.01 *, 0.05  .

Residual standard error: 5.499 on 14 degrees of freedom
Multiple R-squared:  0.968,     Adjusted R-squared:  0.9635
F-statistic: 211.9 on 2 and 14 DF,  p-value: 3.419e-11
```

Example 2: Read tabular data from a file. The data is from:
Davis, J.C. (2002): Statistics and Data Analysis in Geology, Wiley (3rd Edition) See pg: 462-482

A scatter plot of the data and the resultant fit is here.

```

diri    = "./"
fili    = "KENTUCKY.TXT"

ncol    = numAsciiCol(diri+fili)

y       = data(:,0)
x       = data(:,1:)

opt = True
opt@print_anova = True

b    =  reg_multlin_stats(y,x,opt) ; partial regression coef
print(b)

```
Setting print_anova=True resulted in the ANOVA information being printed:
```
------- ANOVA information--------
SST=2934.82  SSR=1800.7  SSE=1134.12
MST=59.8943  MSR=300.117  MSE=26.3748 RSE=5.13564
F-statistic=11.3789 dof=(6,43)
-------
```
The print(b) resulted in:
```
Variable: b
Type: float
Total Size: 28 bytes
7 values
Number of Dimensions: 1
Dimensions and sizes:   
Coordinates:
Number Of Attributes: 27
_FillValue :  9.96921e+36
long_name :   multiple regression coefficients
model :       Yest = b(0) + b(1)*X1 + b(2)*X2 + ...+b(M)*XM
N :   50
NP :  6
M :   7
bstd :        (  0, 0.0485438, 0.2841468, -0.4579774, 0.9748371, -0.1198416, -0.1625881 )
SST : 2934.82
SSE : 1134.117
SSR : 1800.703
MST : 59.89428
MSE : 26.37481
MSE_dof :     43
MSR : 300.1172
RSE : 5.135641
RSE_dof :     42
F :   11.37893
F_dof :       ( 6, 43 )
F_pval :      1.394299e-07
r2 :  0.6135652
r :   0.783304
r2a : 0.5596441
fuv : 0.3864348
Yest :   [ARRAY of 50 elements]
stderr : ( 11.27078, 0.01095906, 0.008168127, 0.1759493, 0.02058695, 0.002186466, 0.07261039 )
tval :   ( -0.1991397, 0.4645738, 2.763535, -1.323133, 3.041301, -0.9319575, -1.605605 )
pval :   ( 0.8430681, 0.6445273, 0.008317729, 0.192626, 0.003960536, 0.3564441, 0.1155149 )

(0)     -2.24446
(1)      0.005091291
(2)      0.0225729
(3)     -0.2328043
(4)      0.0626111
(5)     -0.002037694
(6)     -0.1165836

```

The above matches those returned by R:

```
+++++++> output from R's   'mlr' function <++++++++++++

Coefficients:
(Intercept)           X1         X2          X3         X4        X5        X6
-2.244460     0.005091   0.022573   -0.232804   0.062611 -0.002038  -0.116584

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -2.244460  11.270778  -0.199  0.84309
X1           0.005091   0.010959   0.465  0.64458
X2           0.022573   0.008168   2.764  0.00838 **
X3          -0.232804   0.175949  -1.323  0.19278
X4           0.062611   0.020587   3.041  0.00400 **
X5          -0.002038   0.002186  -0.932  0.35656
X6          -0.116584   0.072610  -1.606  0.11568
---
Signif. codes:  0 ***, 0.001 **, 0.01 *, 0.05  .

Residual standard error: 5.136 on 43 degrees of freedom
Multiple R-squared:  0.6136,    Adjusted R-squared:  0.5596
F-statistic: 11.38 on 6 and 43 DF,  p-value: 1.394e-07
```