generate_sample_indices
Generate indices (subscripts) for resampling: with and without replacement.
Available in version 6.3.0 and later.
Prototype
load "$NCARG_ROOT/lib/ncarg/nclscripts/csm/contributed.ncl" ; This library is automatically loaded ; from NCL V6.2.0 onward. ; No need for user to explicitly load. function generate_sample_indices ( N : integer or long, method : integer )
Arguments
NThe number of indices (subscripts) to be generated.
methodInteger which specifies which type of indices should be generated.
- method=0: Sample without replacement. These can be used to reshuffle the order. Use when the order of sampling may be important. Statistics like the mean and standard deviation will not be changed.
- method=1: Sample with replacement. These can be used for (say) bootstrap resampling. Statistics like the mean and standard deviation will be changed.
Return value
A one-dimensional array of size N containing the random integer indices (subscripts).
Description
Uses a random algorithm to generate subscripts.
If method=1, the indicies are set using random_uniform. Please read the documentation for random_setallseed. The 'seeds' can be manually set via random_setallseed. Example 1 of random_setallseed illustrates a method of using the current date as a seed generator.
See Also
generate_2d_array, random_uniform, random_setallseed, generate_unique_indices
Examples
The generate_sample_indices function uses the uniform random number generator. All NCL's random number functions use random 'seeds'. The seeds can be any integer between 1 and 2147483562 (default is 1234567890). The seeds may be set: (i) by default; (ii) manually specified; or, (iii) 'dynamically set. Some examples:
Manually set the random number seed(s) rand1 = 1234567890 rand2 = 236484749 random_setallseed(rand1, rand2) ; also: random_setallseed(1234567890, 236484749) or, have the system generate some numbers. One approach: rand1 = toint(systemfunc(" date +%s")) rand2 = toint(systemfunc(" date +%s"))
Example 1: Generate N random indices with (method=1) and without (method=0) replacement.
n = 10 iwo = generate_sample_indices( n, 0 ) iw = generate_sample_indices( n, 1 ) print("iwo+" "+iw) The indices would be iwo iw (0) 5 8 (1) 3 3 (2) 0 2 (3) 1 5 (4) 8 7 (5) 9 0 (6) 4 0 (7) 2 2 (8) 6 0 (9) 7 7Note that the iwo are unique while the iw have indices repeated (0, 7) while other indices (1, 4, 6, 9) are not present.
Example 2: Resample an array x[*] with and without replacement.
N = 10 x = fspan(17.89, 52.30,N) iwo = generate_sample_indices( N, 0 ) iw = generate_sample_indices( N, 1 ) xAvg = avg(x) xAvg_iwo = avg(x(iwo)) xAvg_iw = avg(x(iw) ) xStd = stddev(x) xStd_iwo = stddev(x(iwo)) xStd_iw = stddev(x(iw) ) print(x+" "+iwo+" "+ x(iwo)+" "+ iw+" "+x(iw) ) print("avg: "+xAvg+" "+xAvg_iwo+" "+xAvg_iw ) print("std: "+xStd+" "+xStd_iwo+" "+xStd_iw ) x iwo x(iwo) iw x(iw) (0) 17.89 2 25.5367 4 33.1833 (1) 21.7133 3 29.36 0 17.89 (2) 25.5367 1 21.7133 9 52.3 (3) 29.36 0 17.89 7 44.6533 (4) 33.1833 5 37.0067 3 29.36 (5) 37.0067 8 48.4767 4 33.1833 (6) 40.83 7 44.6533 8 48.4767 (7) 44.6533 4 33.1833 0 17.89 (8) 48.4767 6 40.83 1 21.7133 (9) 52.3 9 52.3 0 17.89 avg: 35.095 35.095 31.654 std: 11.5757 11.5757 13.1459Example 3: Resample an array x(N,J,K) with and without replacement.
iwo = generate_sample_indices( N, 0 ) iw = generate_sample_indices( N, 1 ) xw = new (dimsizes(x), typeof(x)) xwo = new (dimsizes(x), typeof(x)) do nl=0,nlat-1 do ml=0,mlon-1 xw(:,nl,ml) = x(iw ,nl,ml) xwo(:,nl,ml) = x(iwo,nl,ml) ; .... do something ... end do end do
==================================
Examples 4 and 5 illustrate the 'mechanics' of bootstrapping. They bootstrap the mean (average) statistic of a univariate variable. Bootstrapping (say) regression coefficients would require 'x' and 'y' variables. In this case, the 'y' would be resampled. Bootstrapping correlations would require that 'x' and 'y' pairs be resampled.
There are other approaches. For example, in correlation bootstrapping,
rather then pair resampling, individual 'x' and 'y' pairs, random block sequences could be used. Say, N=1200, then randomly calculate the correlations of many sub-sample blocks of length (say) 50.
==================================
Example 4:
Resample an array x(N) 'nBoot' times with replacement. Since 'x' is one-dimensional, avg and stddev are used. (dim_avg_n and dim_stddev_n could have been used.) Sort the boot-strapped array in ascending order. Get robust (a) two sided 2.5 and 97.5% confidence limits and (b) one-sided 95% confidence limit.
This example is provided to illustrate the 'mechanics' of the bootstrap methodology. This does not account for any trend or other complications.
Bootstrapping can be computationally intense and, hence, time comsuming. A good test strategy to test the script with a small 'nBoot' to make sure the mechanics are working correctly.
N = 100000 xMean = 0.0 ; population mean xStdDev = 10.0 ; populaton standard deviation random_setallseed(36484749,119494848) ; optional: see above comments x = random_normal(xMean, xStdDev,N) xAvg = avg(x) ; Mean xStd = stddev(x) ; Sample Standard Deviation xStdErr = xStd/sqrt(N) ; std error of mean print("N="+N+" xAvg="+xAvg+" xStd="+xStd+" xStdErr="+xStdErr) nBoot = 10000 ; user specified xBoot = new (nBoot, typeof(x)) random_setallseed(119494848, 36484749) ; optional: see above comments ;For 'fun', generate the sampling indices directly in the 'x' array. do ns=1,nBoot xBoot(ns-1) = avg( x(generate_sample_indices(N,1)) ) end do xAvgBoot= avg(xBoot) ; Averages of bootstrapped samples xStdBoot= stddev(xBoot) ; Std Dev " " " print("nBoot="+nBoot+" xAvgBoot="+xAvgBoot+" xStdBoot="+xStdBoot) qsort(xBoot) ; sort bootstrap means into ascending order n025 = toint(0.025*nBoot) ; indices for sorted array n950 = toint(0.950*nBoot) n975 = toint(0.975*nBoot) print("n025="+n025+" n950="+n950+" n975="+n975) xBoot_025 = xBoot(n025) ; 2.5% level xBoot_950 = xBoot(n950) ; 95.0% level xBoot_975 = xBoot(n975) ; 97.5% level print("xBoot_025="+xBoot_025) print("xBoot_950="+xBoot_950) print("xBoot_975="+xBoot_975)The output for this examples is:
N=100000 xAvg=-0.00324151 xStd=10.0478 <== near 0, 10 nBoot=10000 xAvgBoot=-0.00394782 xStdBoot=0.0316319 <== near xStdErr n025=250 n950=9500 n975=9750 xBoot_025=-0.0662322 xBoot_950=0.0472644 xBoot_975=0.0571641 xStdErr=0.0316228
Example 5:
Similar to Example 4 but resample an array containing 100 years of monthly mean values: x(ntim,nlat,mlon) where ntim=1200. Certainly, all 1200 time steps could be used BUT there is an annual cycle. To bootstrap the (say) January and July values, the monthly sub-samples to be boot strapped would be x(0::12,:,:) and x(6::12,:,:), respectively. Another approach would be: (i) calculate a monthly climatology (ii) calculate the anomalies from each monthly climatology; then, (iii) assuming the anomalies are independent of month, bootstrap the derived anomaly array.
This example is provided to illustrate the 'mechanics' of the bootstrap methodology on a multi-dimensional array. This does not account for trend or other complications.
Bootstrapping can be computationally intense and, hence, time comsuming. This is particulary true if the array sizes [(ntim,nlat,mlon)] are large. A prudent test strategy would be to test script with a small 'nBoot' to make sure the mechanics are working correctly.
fmdl = addfile("MODEL.nc", "r") x = fmdl->TREFHT ; [time | 1200] x [lat | 192] x [lon | 288] xAvg = avg(xAnom) ; mean anomalies over all grid pts and times xStd = stddev(xAnom) ; stddev of all anomalies and times print(xAvg) ; scalar [hopefully 0.0 :-) ] print(xStd) ; scalar dimx = dimsizes(x) ntim = dimx(0) nlat = dimx(1) mlon = dimx(2) units = x@units ; convenience only ; calculate monthly climatologies xClm = clmMonTLL(x) ; [time | 12] x [lat | 192] x [lon | 288] printVarSummary(xClm) ; calculate monthly anomalies xAnom = calcMonAnomTLL(x, xClm) ; [time | 1200] x [lat | 192] x [lon | 288] xAnom@units = units delete([/x, xClm/]) ; no longer needed nBoot = 1000 ; This could be time consuming ; Test with small nBoot (eg: nBoot=10) dimx(0) = nBoot ; 'trick' xBoot = new (dimx, typeof(xAnom)) ; array to hold nBoot results do ns=0,nBoot-1 ; generate bootstrap results iw = generate_sample_indices(ntim,1) xBoot(ns,:,:) = dim_avg_n( xAnom(iw,:,:), 0) end do delete(xAnom) ; no longer needed ; mean of all bootstrap samples at each grid point xAvgBoot = dim_avg_n(xBoot, 0) ; [192] x [288] printVarSummary(xBoot) idx = dim_pqsort_n(xBoot, 2, 0) ; sort bootstrap means into ascending order at each grid point i025 = toint(0.025*nBoot) i975 = toint(0.975*nBoot) ; NCL's 'dimension reduction' will result in a 2D array xBoot_025 = xBoot(i025,:,:) ; 2.5% level; (nlat,mlon) xBoot_975 = xBoot(i975,:,:) ; 97.5% level; (nlat,mlon) xBoot_Range = xBoot_975 - xBoot_025 xBoot_025@long_bame = "Bootstrap 2.5% Mean" xBoot_025@units = units xBoot_975@long_bame = "Bootstrap 97.5% Mean" xBoot_975@units = units xBoot_Range@long_name = "95% anomaly range" xBoot_Range@units = units copy_VarCoords(xAnom(0,:,:), xBoot_025) copy_VarCoords(xAnom(0,:,:), xBoot_975) printVarSummary(xBoot_025) ; [lat | 192] x [lon | 288] printVarSummary(xBoot_975) ; [lat | 192] x [lon | 288] printVarSummary(xBoot_Range) ; [lat | 192] x [lon | 288] printMinMax(xBoot_Range, 1))