
equiv_sample_size
Estimates the number of independent values in a series of correlated values.
Prototype
function equiv_sample_size ( x : numeric, siglvl : numeric, opt : integer ) return_val : integer
Arguments
xInput array of any dimensionality. Currently, the equiv_sample_size function will operate on the rightmost dimension only. Typically, this will the time dimension. Further, the function assumes the time steps are equally spaced.
NOTE: NCL's dimension reordering Dimension reordering may have to be used to make 'time' the rightmost dimension.
siglvlUser specified critical significance level (0.0 < siglvl < 1.0). Typically, siglvl <= 0.10 (most frequently, siglvl=0.05 or siglvl=0.01).
optCurrently not used. Set to zero.
Return value
The return value(s) will be of type integer. The number of dimensions will be one less than the number of input dimensions (i.e. the rightmost dimension of x will no longer be present).
Description
Given a time series of length N at one or more grid points or locations, this function perform multiple steps. Specifically:
+ Calculate the lag-one autocorrelation (hereafter, r1) at each grid point or location. + Estimate the equivalent degrees of freedom using: + For N.ge.50: Neqv = N*( (1-r1)/(1+r1) ) + For N.lt.50 use a more complicated formula. + Test the null hypothesis [no correlation; H0: r1=0] at the user specified critical level (siglvl) using Neqv + If r1 is significant, it will return Neqv, else return the original sample size, N
References: Taking Serial Correlation into Account in Tests of the Mean F. W. Zwiers and H. von Storch, J. Climate 1995, pp336-351 Class Notes: C. Bretherton (2014): Lecture 3: Statistical sampling uncertainty.There are caveats:
- The input time series must be a Gaussian "red-noise" process (positive r1). If the function calculates a negative r1 ("blue-noise" process), it will return the original sample size, N.
- N should be "large", say, greater-than or equal to 50.
See Also
Examples
Example 1
Here x is a series of length N (=1000).
siglvl = 0.01 neqv = equiv_sample_size (x, siglvl,0)neqv will be a scalar with an estimate of the equivalent sample size. Some examples for N=200:
r neqv 0.702441 34 0.628879 45 0.576756 53 0.487207 68 0.407399 83 0.283313 111 0.119770 156 0.069794 200
Example 2
If y has size (ntim,nlat,nlon) and named dimensions "time", "lat", "lon" [i.e., y(time,lat,lon)], where "ntim" is large, then use NCL's dimension reordering:
siglvl = 0.05 neqv = equiv_sample_size (y(lat|:,lon|:,time|:), siglvl,0) ; (nlat,mlon)neqv will be size (nlat,mlon) containing at each grid point an estimate of the number of independent observations. Again, use the results carefully. Even if the data have no significant auto correlation, some grid points may indicate significant positive r1 just due to sampling. One should use field significance testing to see if the sample sizes returned are indeed significant.
Example 3
Let's say the above had monthly data and the users wanted to check to see if successive Januaries, Februaries, etc. were significantly correlated and, if so, get the equivalent samples sizes:
siglvl = 0.01 do nmo=0,nmos-1 ; nmos=12 neqv = equiv_sample_size (y(lat|:,lon|:,time|nt:ntim-1,nmos), siglvl,0) ; neqv ==>(nlat,mlon) end doEach pass through the loop will yield an estimate of the equivalent sample size. Previous comments apply here also.
Example 4
Let x(time,lat,lon) and y(time,lat,lon) where "time", "lat", "lon" are dimension names.
- Use NCL's named dimensions to reorder in time
- Specify a critical significance level to test the lag-one auto-correlation coefficient and determine the (temporal) number of equivalent sample sizes each grid point using equiv_sample_size;
- [optional] Estimate a single global mean equivalent sample size using wgt_areaave that could be used for the ttest or ftest.
(1) xtmp = x(lat|:,lon|:,time|:) ; reorder but do it only once [temporary] ytmp = y(lat|:,lon|:,time|:) (2) sigr = 0.05 ; critical sig lvl for r xEqv = equiv_sample_size (xtmp, sigr,0) yEqv = equiv_sample_size (ytmp, sigr,0) (3) xN = wgt_areaave (xEqv, wgtx, 1., 0) ; wgtx could be gaussian weights yN = wgt_areaave (yEqv, wgty, 1., 0)