Significance Testing from Nicholas Herold on 2007-06-15 (ncl-talk 2007 archive)

From: Nicholas Herold <nher5224_at_nyahnyahspammersnyahnyah>
Date: Fri, 15 Jun 2007 23:16:53 +1000

Hi,

I am looking to implement the student’s t-test (ttest) for two model runs
of 10 years each (10yrs, 48lat,96lon) to see if temperature differences of
their annual mean are significant. I have read the ttest documentation
however I am not entirely sure as to the difference in accuracy between
the four examples that are provided.

I have written a script (most of which follows) using only ttest on each
grid cell, comparing back through the 10 years of data for that grid cell
and determining whether the difference for that grid cell between the two
runs is significant (basically following example 1). The output seems odd
as most of the areas which experienced quite substantial differences in
temperature were excluded (not deemed significant) in the significance
plot, which seems counter intuitive since the greater something varies
from the mean, the greater its chances are of being significant, yet the 3
or 4 largest anomalies were not considered significant, which forced me to
question my method.

do y = 0, 47 ; cycle through lat
        do x = 0, 95 ; cycle through lon
                line1 = variable_internal1(:,y,x)
                line2 = variable_internal2(:,y,x)
; variable_internal1 and variable_internal2 contain the 10 years of 48x96
; data (ie. 10x48x96)

; check for missing values (ie. For values that only exist over land)
                if (all(ismissing(line1))) then
                        new_grid(y,x)=variable_internal1_at_missing_value
                else
                        ave1 = avg(line1)
                        ave2 = avg(line2)
                        var1 = variance(line1)
                        var2 = variance(line2)
                        s1 = dimsizes(line1)
                        s2 = dimsizes(line2)
                        iflag = False
                        prob = ttest(ave1,var1,s1,ave2,var2,s2,iflag,False)

                        if (prob.lt."0.05")
                                new_grid(y,x)=ave1-ave2
                        end if
                end if
        end do
end do

.
.
.
plot(0) = gsn_csm_contour_map_ce(wks,new_grid,resources)

My question is basically is this method not as accurate as the examples in
the ttest documentation, where a combination of other functions are used
(pasted below, example 4);

  dimXY = dimsizes(x)
  ntim = dimXY(0)
  nlat = dimXY(1)
  mlon = dimXY(2)

xtmp = x(lat|:,lon|:,time|:) ; reorder but do it only once
[temporary]
ttmp = y(lat|:,lon|:,time|:)

  xAve = dim_avg (xtmp) ; calculate means at each grid point
  yAve = dim_avg (ytmp)
  xVar = dim_variance (xtmp) ; calculate variances
  yVar = dim_variance (ytmp)

  sigr = 0.05 ; critical sig lvl for r
  xEqv = equiv_sample_size (xtmp, sigr,0)
  yEqv = equiv_sample_size (ytmp, sigr,0)

  xN = wgt_areaave (xEqv, wgty, 1., 0) ; wgty could be gaussian
weights
  yN = wgt_areaave (yEqv, wgty, 1., 0)
                                  (5)
  iflag= False ; population variance similar
  prob = ttest(xAve,xVar,xN, yAve,yVar,yN, iflag, False)

I have looked at other pieces of code and the ones I have found all seem to
differ slightly from each other. The above example seems more complicated,
though is it only for certain situations?

Any help would be appreciated.

Kind Regards,

Nicholas

----------------------------------------------------------------
This message was sent using IMP, the Internet Messaging Program.
_______________________________________________
ncl-talk mailing list
ncl-talk_at_ucar.edu
http://mailman.ucar.edu/mailman/listinfo/ncl-talk
Received on Fri Jun 15 2007 - 07:16:53 MDT

This archive was generated by hypermail 2.2.0 : Fri Jun 15 2007 - 12:34:56 MDT