From: Daran Rife <drife_at_nyahnyahspammersnyahnyah>

Date: Mon Feb 21 2011 - 13:45:07 MST

Date: Mon Feb 21 2011 - 13:45:07 MST

Hi Dennis,

Thanks for the additional information and results. Emilie has gone on

maternity leave so you're not likely to hear from her for a while. We

have been working this issue together. The R code to calculate the

Spearman rank correlation looks like this:

x <- read.table("data_Cabauw_MERRA_wspd-iqrln_full1.dat")

y <- read.table("data_Cabauw_MERRA_wspd-iqrln_hist_wspd-iqrln-12-1.dat")

spear <- cor(x$V1,y$V1, method="spearman)

print(spear)

[1] 0.07144465

spear^2

[1] 0.005104339

I have the same suspicion about how the two methods deal with ties when

ranking the data before computing the correlation. Our data have many,

many non-zero values whose frequency is the identical. Thus, there are

many ties when the frequencies are ranked. Like you, I see nothing in

NCL's Fortran-based routine that indicates ties are an issue.

The R documentation states "Spearman's rho statistic is used to

estimate a

rank-based measure of association. These are more robust and have been

recommended if the data do not necessarily come from a bivariate normal

distribution."

And I emulated your experiment by removing the non-zero elements.

ind <- which(x != 0 & y != 0)

spear_nonzero <- cor(x$V1[ind], y$V1[ind], method="spearman")

Warning message:

In cor(x$V1[ind], y$V1[ind], method = "spearman") :

the standard deviation is zero

The correlation is NA, because x$V1[ind] and y$V1[ind] are perfectly

correlated, which can be readily seen by looking at x$V1[ind] and

y$V1[ind].

* > x$V1[ind]
*

[1] 0.03042288 0.03042288 0.03042288 0.03042288 0.03042288 0.03042288

[7] 0.03042288 0.03042288 0.03042288 0.03042288 0.03042288 0.03042288

* > y$V1[ind]
*

[1] 0.03042288 0.03042288 0.03042288 0.03042288 0.03042288 0.03042288

[7] 0.03042288 0.03042288 0.03042288 0.03042288 0.03042288 0.03042288

This again, is very different that what NCL's spcorr function returns.

I assume you intended to remove all non-zero values, but your function

doesn't appear to do this.

i = ind( .not.(x.eq.0 .and. y.eq.0) )

print(y(i))

(0) 0

(1) 0

(2) 0

(3) 0.03042288

(4) 0.03042288

(5) 0

(6) 0

(7) 0

(8) 0

(9) 0

(10) 0

(11) 0

(12) 0

(13) 0

(14) 0

(15) 0

(16) 0

(17) 0

.

.

.

I tried to modify your statement to select only non-zero elements from

both x and y but I get the following error, which clearly indicates

that there are a different number of non-zero elements in x and y.

i = ind(x.ne.0 .and. y.ne.0)

fatal:Dimension sizes of left hand side and right hand side of

assignment do not match

fatal:Execute: Error occurred at or near line 40

Because I am not an NCL expert, I don't know how to compute the inter-

section between the non-zero element indices in x and y. In any case,

thanks for looking into this further.

Sincerely,

Daran

-- Message: 1 Date: Sun, 20 Feb 2011 14:05:09 -0700 From: Dennis Shea <shea@ucar.edu> Subject: Re: A problem with the Spearman rank correlation function (spcorr) To: Emilie Vanvyve <evanvyve@ucar.edu> Cc: ncl-talk@ucar.edu Message-ID: <4D618205.2030200@ucar.edu> Content-Type: text/plain; charset=ISO-8859-1; format=flowed x = asciiread("data_Cabauw_MERRA_wspd-iqrln_full1.dat",-1,"float") y = asciiread("data_Cabauw_MERRA_wspd-iqrln_hist_wspd-iqrln12-1.dat",-1, r = spcorr(x,y) ; r = 0.6188006 ; r*r=0.3829142 Just for 'fun', I also tried i = ind( .not.(x.eq.0 .and. y.eq.0) ) R = spcorr(x(i),y(i)) ; R = 0.3943804 ; R*R = 0.1555359 ==== I speculate it must have something to do with the many ties [ x(n)=y(n)=0 ]. There is nothing in the fortran code to indicate ties are an issue. ==== The R code shows no rank correlation. Is that what you expect? _______________________________________________ ncl-talk mailing list List instructions, subscriber options, unsubscribe: http://mailman.ucar.edu/mailman/listinfo/ncl-talkReceived on Mon Feb 21 13:45:12 2011

*
This archive was generated by hypermail 2.1.8
: Wed Feb 23 2011 - 16:47:57 MST
*