NCL Home > Documentation > Functions > Random number generators, Array creators

generate_sample_indices

Generate indices (subscripts) for resampling: with and without replacement.

Available in version 6.3.0 and later.

Prototype

load "$NCARG_ROOT/lib/ncarg/nclscripts/csm/contributed.ncl"  ; This library is automatically loaded
                                                             ; from NCL V6.2.0 onward.
                                                             ; No need for user to explicitly load.

	function generate_sample_indices (
		N       : integer or long,  
		method  : integer           
	)

Arguments

N

The number of indices (subscripts) to be generated.

method

Integer which specifies which type of indices should be generated.

  • method=0: Sample without replacement. These can be used to reshuffle the order. Use when the order of sampling may be important. Statistics like the mean and standard deviation will not be changed.
  • method=1: Sample with replacement. These can be used for (say) bootstrap resampling. Statistics like the mean and standard deviation will be changed.

Return value

A one-dimensional array of size N containing the random integer indices (subscripts).

Description

Uses a random algorithm to generate subscripts.

If method=1, the indicies are set using random_uniform. Please read the documentation for random_setallseed. The 'seeds' can be manually set via random_setallseed. Example 1 of random_setallseed illustrates a method of using the current date as a seed generator.

See Also

generate_2d_array, random_uniform, random_setallseed, generate_unique_indices

Examples

The generate_sample_indices function uses the uniform random number generator. All NCL's random number functions use random 'seeds'. The seeds can be any integer between 1 and 2147483562 (default is 1234567890). The seeds may be set: (i) by default; (ii) manually specified; or, (iii) 'dynamically set. Some examples:


Manually set the random number seed(s)
         rand1 = 1234567890
         rand2 =  236484749
         random_setallseed(rand1, rand2) ; also: random_setallseed(1234567890, 236484749)

or, have  the system generate some numbers. One approach:

         rand1 = toint(systemfunc(" date +%s"))
         rand2 = toint(systemfunc(" date +%s"))

Example 1: Generate N random indices with (method=1) and without (method=0) replacement.

         n   = 10
         iwo = generate_sample_indices( n, 0 )
         iw  = generate_sample_indices( n, 1 )

         print("iwo+"   "+iw)

The indices would be
                 iwo iw 
          (0)     5   8
          (1)     3   3
          (2)     0   2
          (3)     1   5
          (4)     8   7
          (5)     9   0
          (6)     4   0
          (7)     2   2
          (8)     6   0
          (9)     7   7
Note that the iwo are unique while the iw have indices repeated (0, 7) while other indices (1, 4, 6, 9) are not present.

Example 2: Resample an array x[*] with and without replacement.

         N   = 10
         x   = fspan(17.89, 52.30,N)
         iwo = generate_sample_indices( N, 0 )
         iw  = generate_sample_indices( N, 1 )

         xAvg     = avg(x)
         xAvg_iwo = avg(x(iwo))
         xAvg_iw  = avg(x(iw) )

         xStd     = stddev(x)
         xStd_iwo = stddev(x(iwo))
         xStd_iw  = stddev(x(iw) )

         print(x+"   "+iwo+"   "+ x(iwo)+"   "+ iw+"   "+x(iw)  )

         print("avg:  "+xAvg+"      "+xAvg_iwo+"      "+xAvg_iw )
         print("std:  "+xStd+"      "+xStd_iwo+"      "+xStd_iw )


                    x         iwo   x(iwo)     iw    x(iw)
          (0)     17.89        2   25.5367      4   33.1833
          (1)     21.7133      3   29.36        0   17.89
          (2)     25.5367      1   21.7133      9   52.3
          (3)     29.36        0   17.89        7   44.6533
          (4)     33.1833      5   37.0067      3   29.36
          (5)     37.0067      8   48.4767      4   33.1833
          (6)     40.83        7   44.6533      8   48.4767
          (7)     44.6533      4   33.1833      0   17.89
          (8)     48.4767      6   40.83        1   21.7133
          (9)     52.3         9   52.3         0   17.89

          avg:    35.095           35.095           31.654     
          std:    11.5757          11.5757          13.1459


Example 3: Resample an array x(N,J,K) with and without replacement.
         
         iwo = generate_sample_indices( N, 0 )
         iw  = generate_sample_indices( N, 1 )

         xw   = new (dimsizes(x), typeof(x))
         xwo  = new (dimsizes(x), typeof(x))

         do nl=0,nlat-1              
           do ml=0,mlon-1
              xw(:,nl,ml)  = x(iw ,nl,ml)
              xwo(:,nl,ml) = x(iwo,nl,ml)

             ; .... do something ...

           end do
         end do

==================================
Examples 4 and 5 illustrate the 'mechanics' of bootstrapping. They bootstrap the mean (average) statistic of a univariate variable. Bootstrapping (say) regression coefficients would require 'x' and 'y' variables. In this case, the 'y' would be resampled. Bootstrapping correlations would require that 'x' and 'y' pairs be resampled.

There are other approaches. For example, in correlation bootstrapping, rather then pair resampling, individual 'x' and 'y' pairs, random block sequences could be used. Say, N=1200, then randomly calculate the correlations of many sub-sample blocks of length (say) 50.
==================================

Example 4:

Resample an array x(N) 'nBoot' times with replacement. Since 'x' is one-dimensional, avg and stddev are used. (dim_avg_n and dim_stddev_n could have been used.) Sort the boot-strapped array in ascending order. Get robust (a) two sided 2.5 and 97.5% confidence limits and (b) one-sided 95% confidence limit.

This example is provided to illustrate the 'mechanics' of the bootstrap methodology. This does not account for any trend or other complications.

Bootstrapping can be computationally intense and, hence, time comsuming. A good test strategy to test the script with a small 'nBoot' to make sure the mechanics are working correctly.



	N       = 100000 
        xMean   = 0.0             ; population mean
        xStdDev = 10.0            ; populaton standard deviation

        random_setallseed(36484749,119494848)   ; optional: see above comments
        x       = random_normal(xMean, xStdDev,N)

	xAvg    = avg(x)          ; Mean
	xStd    = stddev(x)       ; Sample Standard Deviation
	xStdErr = xStd/sqrt(N)                       ; std error of mean
	print("N="+N+"  xAvg="+xAvg+"  xStd="+xStd+"  xStdErr="+xStdErr)

	nBoot =   10000                           ; user specified
        xBoot   = new (nBoot, typeof(x))

        random_setallseed(119494848, 36484749)   ; optional: see above comments
 
	                                   ;For 'fun', generate the sampling indices directly in the 'x' array.
	do ns=1,nBoot
           xBoot(ns-1) = avg( x(generate_sample_indices(N,1)) )
	end do

	xAvgBoot= avg(xBoot)      ; Averages of bootstrapped samples
        xStdBoot= stddev(xBoot)   ; Std Dev  "        "        "
	print("nBoot="+nBoot+"  xAvgBoot="+xAvgBoot+"  xStdBoot="+xStdBoot)

	qsort(xBoot)                ; sort bootstrap means into ascending order

	n025 = toint(0.025*nBoot)   ; indices for sorted array
	n950 = toint(0.950*nBoot)
        n975 = toint(0.975*nBoot)
	print("n025="+n025+"  n950="+n950+"  n975="+n975)

	xBoot_025 = xBoot(n025)     ;  2.5% level
	xBoot_950 = xBoot(n950)     ; 95.0% level
	xBoot_975 = xBoot(n975)     ; 97.5% level
	print("xBoot_025="+xBoot_025)
	print("xBoot_950="+xBoot_950)
	print("xBoot_975="+xBoot_975)
	
The output for this examples is:

	N=100000  xAvg=-0.00324151  xStd=10.0478               <== near 0, 10
	nBoot=10000  xAvgBoot=-0.00394782  xStdBoot=0.0316319  <== near xStdErr

	n025=250  n950=9500  n975=9750
	xBoot_025=-0.0662322
	xBoot_950=0.0472644
	xBoot_975=0.0571641

	xStdErr=0.0316228
	

Example 5:

Similar to Example 4 but resample an array containing 100 years of monthly mean values: x(ntim,nlat,mlon) where ntim=1200. Certainly, all 1200 time steps could be used BUT there is an annual cycle. To bootstrap the (say) January and July values, the monthly sub-samples to be boot strapped would be x(0::12,:,:) and x(6::12,:,:), respectively. Another approach would be: (i) calculate a monthly climatology (ii) calculate the anomalies from each monthly climatology; then, (iii) assuming the anomalies are independent of month, bootstrap the derived anomaly array.

This example is provided to illustrate the 'mechanics' of the bootstrap methodology on a multi-dimensional array. This does not account for trend or other complications.

Bootstrapping can be computationally intense and, hence, time comsuming. This is particulary true if the array sizes [(ntim,nlat,mlon)] are large. A prudent test strategy would be to test script with a small 'nBoot' to make sure the mechanics are working correctly.

	fmdl    = addfile("MODEL.nc", "r")
	x       = fmdl->TREFHT                      ; [time | 1200] x [lat | 192] x [lon | 288]

	xAvg    = avg(xAnom)                        ; mean anomalies over all grid pts and times
	xStd    = stddev(xAnom)                     ; stddev of all anomalies and times
	print(xAvg)                                  ; scalar [hopefully 0.0  :-) ]
	print(xStd)                                  ; scalar

	dimx    = dimsizes(x)
	ntim    = dimx(0)
	nlat    = dimx(1)
        mlon    = dimx(2)

	units   = x@units                            ; convenience only
	                                             ; calculate monthly climatologies
	xClm   = clmMonTLL(x)                       ; [time | 12] x [lat | 192] x [lon | 288]
        printVarSummary(xClm)
	; calculate monthly anomalies
	xAnom  = calcMonAnomTLL(x, xClm)           ; [time | 1200] x [lat | 192] x [lon | 288]
	xAnom@units = units

	delete([/x, xClm/])                          ; no longer needed

	nBoot   = 1000                               ; This could be time consuming
	                                             ; Test with small nBoot (eg: nBoot=10)
	dimx(0) = nBoot                              ; 'trick'                                      
	xBoot   = new (dimx, typeof(xAnom))         ; array to hold nBoot results 

	do ns=0,nBoot-1                              ; generate bootstrap results
           iw = generate_sample_indices(ntim,1)
           xBoot(ns,:,:) = dim_avg_n( xAnom(iw,:,:), 0)
	end do
	delete(xAnom)                                ; no longer needed

	                                             ; mean of all bootstrap samples at each grid point
	xAvgBoot = dim_avg_n(xBoot, 0)              ; [192] x [288]                   
        printVarSummary(xBoot)

	idx  = dim_pqsort_n(xBoot, 2, 0) ; sort bootstrap means into ascending order at each grid point
	i025 = toint(0.025*nBoot)
        i975 = toint(0.975*nBoot)
                                          ; NCL's 'dimension reduction' will result in a 2D array
	xBoot_025   = xBoot(i025,:,:)           ;  2.5% level; (nlat,mlon)
	xBoot_975   = xBoot(i975,:,:)           ; 97.5% level; (nlat,mlon)
	xBoot_Range = xBoot_975 - xBoot_025

	xBoot_025@long_bame = "Bootstrap  2.5% Mean"
	xBoot_025@units     = units
	xBoot_975@long_bame = "Bootstrap 97.5% Mean"
	xBoot_975@units     = units
	xBoot_Range@long_name = "95% anomaly range"
	xBoot_Range@units     = units

	copy_VarCoords(xAnom(0,:,:), xBoot_025)  
	copy_VarCoords(xAnom(0,:,:), xBoot_975)

	printVarSummary(xBoot_025)    ; [lat | 192] x [lon | 288]
	printVarSummary(xBoot_975)    ; [lat | 192] x [lon | 288]
	printVarSummary(xBoot_Range)  ; [lat | 192] x [lon | 288]
        printMinMax(xBoot_Range, 1))