Hi Dennis,
Thanks for the additional information and results. Emilie has gone on
maternity leave so you're not likely to hear from her for a while. We
have been working this issue together. The R code to calculate the
Spearman rank correlation looks like this:
x <- read.table("data_Cabauw_MERRA_wspd-iqrln_full1.dat")
y <- read.table("data_Cabauw_MERRA_wspd-iqrln_hist_wspd-iqrln-12-1.dat")
spear <- cor(x$V1,y$V1, method="spearman)
print(spear)
[1] 0.07144465
spear^2
[1] 0.005104339
I have the same suspicion about how the two methods deal with ties when
ranking the data before computing the correlation. Our data have many,
many non-zero values whose frequency is the identical. Thus, there are
many ties when the frequencies are ranked. Like you, I see nothing in
NCL's Fortran-based routine that indicates ties are an issue.
The R documentation states "Spearman's rho statistic is used to
estimate a
rank-based measure of association. These are more robust and have been
recommended if the data do not necessarily come from a bivariate normal
distribution."
And I emulated your experiment by removing the non-zero elements.
ind <- which(x != 0 & y != 0)
spear_nonzero <- cor(x$V1[ind], y$V1[ind], method="spearman")
Warning message:
In cor(x$V1[ind], y$V1[ind], method = "spearman") :
the standard deviation is zero
The correlation is NA, because x$V1[ind] and y$V1[ind] are perfectly
correlated, which can be readily seen by looking at x$V1[ind] and
y$V1[ind].
> x$V1[ind]
[1] 0.03042288 0.03042288 0.03042288 0.03042288 0.03042288 0.03042288
[7] 0.03042288 0.03042288 0.03042288 0.03042288 0.03042288 0.03042288
> y$V1[ind]
[1] 0.03042288 0.03042288 0.03042288 0.03042288 0.03042288 0.03042288
[7] 0.03042288 0.03042288 0.03042288 0.03042288 0.03042288 0.03042288
This again, is very different that what NCL's spcorr function returns.
I assume you intended to remove all non-zero values, but your function
doesn't appear to do this.
i = ind( .not.(x.eq.0 .and. y.eq.0) )
print(y(i))
(0) 0
(1) 0
(2) 0
(3) 0.03042288
(4) 0.03042288
(5) 0
(6) 0
(7) 0
(8) 0
(9) 0
(10) 0
(11) 0
(12) 0
(13) 0
(14) 0
(15) 0
(16) 0
(17) 0
.
.
.
I tried to modify your statement to select only non-zero elements from
both x and y but I get the following error, which clearly indicates
that there are a different number of non-zero elements in x and y.
i = ind(x.ne.0 .and. y.ne.0)
fatal:Dimension sizes of left hand side and right hand side of
assignment do not match
fatal:Execute: Error occurred at or near line 40
Because I am not an NCL expert, I don't know how to compute the inter-
section between the non-zero element indices in x and y. In any case,
thanks for looking into this further.
Sincerely,
Daran
-- Message: 1 Date: Sun, 20 Feb 2011 14:05:09 -0700 From: Dennis Shea <shea@ucar.edu> Subject: Re: A problem with the Spearman rank correlation function (spcorr) To: Emilie Vanvyve <evanvyve@ucar.edu> Cc: ncl-talk@ucar.edu Message-ID: <4D618205.2030200@ucar.edu> Content-Type: text/plain; charset=ISO-8859-1; format=flowed x = asciiread("data_Cabauw_MERRA_wspd-iqrln_full1.dat",-1,"float") y = asciiread("data_Cabauw_MERRA_wspd-iqrln_hist_wspd-iqrln12-1.dat",-1, r = spcorr(x,y) ; r = 0.6188006 ; r*r=0.3829142 Just for 'fun', I also tried i = ind( .not.(x.eq.0 .and. y.eq.0) ) R = spcorr(x(i),y(i)) ; R = 0.3943804 ; R*R = 0.1555359 ==== I speculate it must have something to do with the many ties [ x(n)=y(n)=0 ]. There is nothing in the fortran code to indicate ties are an issue. ==== The R code shows no rank correlation. Is that what you expect? _______________________________________________ ncl-talk mailing list List instructions, subscriber options, unsubscribe: http://mailman.ucar.edu/mailman/listinfo/ncl-talkReceived on Mon Feb 21 13:45:12 2011
This archive was generated by hypermail 2.1.8 : Wed Feb 23 2011 - 16:47:57 MST