NCL Home > Documentation > Functions > Statistics, Bootstrap

bootstrap_correl

Bootstrap estimates of sample cross correlations (ie, Pearson's correlation coefficient) between two variables.

Available in version 6.4.0 and later.

Prototype

	function bootstrap_correl (
		x         : numeric,  
		y         : numeric,  
		nBoot [1] : integer,  
		nDim  [*] : integer,  
		opt   [1] : logical   
	)

	return_val [ variable of type 'list' containing multiple estimates] 

Arguments

x

A numeric array of up to four dimensions: x(N), x(N,:), x(N,:,:), x(N,:,:,:) where 'N' represents the original sample size.

y

A numeric array of up to four dimensions: y(N), y(N,:), y(N,:,:), y(N,:,:,:) where 'N' represents the original sample size.

nBoot

An integer specifying the number of bootstrap data samples to be generated.

nDim

The dimension(s) of x and y on which to calculate the statistic. Most commonly, this is set to (/0,0/) or, if they are both the same, simply, 0.

opt

A logical scalar to which optional attributes may be attached. If opt=False, default values are used. If opt=True and no optional attributes are present, default values will be used. If opt=True then:

  • opt@sample_size: specifies the size of the resampled array to be used for the bootstrapped statistics.
  • opt@sample_size=N is the default.
  • opt@sample_size=n where (n.le.N). When this option is used, n=toint(f*N) where 'f' represents (say) 0.10 to 0.20.

  • opt@rseed1=rseed1: allows user to set the first random seed integer value. Default is to use the system initial random seed. (See: random_setallseed)
  • opt@rseed2=rseed2: allows user to set the second random seed integer value. Default is to use the system initial random seed. (See: random_setallseed)
  • optrseed3="clock": tells NCL to use the 'date' clock to set the two random seeds. (See: random_setallseed)

Return value

A variable of type 'list'. Members of a list can be accessed directly. However, it is clearer if the members are explicity extracted and given meaningful names.

                                    ; typeof(Bootstrap) is 'list'
   BootStrap  = bootstrap_correl(x, stat, nBoot, 0, opt)
   rBoot      = BootStrap[0]        ; All the bootstrapped values
   rBootAvg   = BootStrap[1]        ; Average cross correlation of the bootstrapped samples
   rBootStd   = BootStrap[2]        ; Average cross correlation of the bootstrapped samples
   delete(BootStrap)                ; no longer needed

All appropriate meta data are returned. Please use printVarSummary(...) to examine the returned variable.

Description

Bootstrapping is a statistical method that uses data resampling with replacement (see: generate_sample_indices) to estimate the properties of nearly any statistic. It is particularly useful when dealing with small sample sizes. A key feature is that bootstrapping makes no apriori assumption about the distribution of the sample data.

The default version resamples using x and y pairs.

Some side points to remember about cross correlations (Wikipedia):

  • The cross correlation coefficient detects only linear dependencies between two variables.
  • For the case of a linear model with a single independent variable (x), the coefficient of determination is the square of r (r^2), Pearson's product-moment coefficient.
The returned 'rBootAvg' and 'rBootStd' (see Example 1) are derived via: (i) using the Fischer z-transform on each bootstrapped cross correlation; (ii) computing the means and standard deviations in 'z-spece'; (iii) then performing the inverse z-transform.

References:

Computer Intensive Methods in Statistics 
   P. Diaconis and B. Efron 
   Scientific American (1983), 248:116-130  
   doi:10.1038/scientificamerican0583-116
   http://www.nature.com/scientificamerican/journal/v248/n5/pdf/scientificamerican0583-116.pdf
   
An Introduction to the Bootstrap 
   B. Efron and R.J. Tibshirani, Chapman and Hall (1993) 
   
Bootstrap Methods and Permutation Tests: Companion Chapter 18 to the Practice of Business Statistics
   Hesterberg, T. et al (2003)
   http://statweb.stanford.edu/~tibs/stat315a/Supplements/bootstrap.pdf

Climate Time Series Analysis: Classical Statistical and Bootstrap Methods
   M. Mudelsee (2014) Second edition. Springer, Cham Heidelberg New York Dordrecht London
   ISBN: 978-3-319-04449-1, e-ISBN: 978-3-319-04450-7
   doi: 10.1007/978-3-319-04450-7
   xxxii + 454 pp; Atmospheric and Oceanographic Sciences Library, Vol. 51

See Also

bootstrap_stat, bootstrap_diff, bootstrap_estimate, bootstrap_regcoef, generate_sample_indices, ListIndexFromName

Examples

Please see the Bootstrap and Resampling application page.

Example 1: Let x(N); y(N), N=100:

   nBoot       = 1000         ; user set
   nDim        = 0            ; or (/0,0/); dimension numbers corresponding to 'N'
   opt         = False        ; use all default options

   BootStrap   = bootstrap_correl(x, y, nBoot, nDim, opt)
   rBoot       = BootStrap[0] ; bootstrapped cross-correlations in ascending order
   rBootAvg    = BootStrap[1] ; Average of the z-transformed bootstrapped cross correlations
   rBootStd    = BootStrap[2] ; Std. deviation(s) of the z-transformed bootstrapped cross correlations
   delete(BootStrap)          ; no longer needed

   rBootLow    = bootstrap_estimate(rBoot, 0.025, False)   ;  2.5% lower confidence bound 
   rBootMed    = bootstrap_estimate(rBoot, 0.500, False)   ; 50.0% median of bootstrapped estimates

   printVarSummary(rBoot)   ; information only
   printVarSummary(rBootMed)