Re: A problem with the Spearman rank correlation function (spcorr)

From: Daran Rife <drife_at_nyahnyahspammersnyahnyah>
Date: Mon Feb 21 2011 - 13:45:07 MST

Hi Dennis,

Thanks for the additional information and results. Emilie has gone on
maternity leave so you're not likely to hear from her for a while. We
have been working this issue together. The R code to calculate the
Spearman rank correlation looks like this:

x <- read.table("data_Cabauw_MERRA_wspd-iqrln_full1.dat")
y <- read.table("data_Cabauw_MERRA_wspd-iqrln_hist_wspd-iqrln-12-1.dat")
spear <- cor(x$V1,y$V1, method="spearman)
print(spear)
[1] 0.07144465
spear^2
[1] 0.005104339

I have the same suspicion about how the two methods deal with ties when
ranking the data before computing the correlation. Our data have many,
many non-zero values whose frequency is the identical. Thus, there are
many ties when the frequencies are ranked. Like you, I see nothing in
NCL's Fortran-based routine that indicates ties are an issue.

The R documentation states "Spearman's rho statistic is used to
estimate a
rank-based measure of association. These are more robust and have been
recommended if the data do not necessarily come from a bivariate normal
distribution."

And I emulated your experiment by removing the non-zero elements.

ind <- which(x != 0 & y != 0)
spear_nonzero <- cor(x$V1[ind], y$V1[ind], method="spearman")
Warning message:
In cor(x$V1[ind], y$V1[ind], method = "spearman") :
   the standard deviation is zero

The correlation is NA, because x$V1[ind] and y$V1[ind] are perfectly
correlated, which can be readily seen by looking at x$V1[ind] and
y$V1[ind].

> x$V1[ind]
  [1] 0.03042288 0.03042288 0.03042288 0.03042288 0.03042288 0.03042288
  [7] 0.03042288 0.03042288 0.03042288 0.03042288 0.03042288 0.03042288
> y$V1[ind]
  [1] 0.03042288 0.03042288 0.03042288 0.03042288 0.03042288 0.03042288
  [7] 0.03042288 0.03042288 0.03042288 0.03042288 0.03042288 0.03042288

This again, is very different that what NCL's spcorr function returns.
I assume you intended to remove all non-zero values, but your function
doesn't appear to do this.

i = ind( .not.(x.eq.0 .and. y.eq.0) )

print(y(i))

(0) 0
(1) 0
(2) 0
(3) 0.03042288
(4) 0.03042288
(5) 0
(6) 0
(7) 0
(8) 0
(9) 0
(10) 0
(11) 0
(12) 0
(13) 0
(14) 0
(15) 0
(16) 0
(17) 0
.
.
.

I tried to modify your statement to select only non-zero elements from
both x and y but I get the following error, which clearly indicates
that there are a different number of non-zero elements in x and y.

i = ind(x.ne.0 .and. y.ne.0)
fatal:Dimension sizes of left hand side and right hand side of
assignment do not match
fatal:Execute: Error occurred at or near line 40

Because I am not an NCL expert, I don't know how to compute the inter-
section between the non-zero element indices in x and y. In any case,
thanks for looking into this further.

Sincerely,

Daran

--
Message: 1
Date: Sun, 20 Feb 2011 14:05:09 -0700
From: Dennis Shea <shea@ucar.edu>
Subject: Re: A problem with the Spearman rank correlation
	function (spcorr)
To: Emilie Vanvyve <evanvyve@ucar.edu>
Cc: ncl-talk@ucar.edu
Message-ID: <4D618205.2030200@ucar.edu>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
x = asciiread("data_Cabauw_MERRA_wspd-iqrln_full1.dat",-1,"float")
y = asciiread("data_Cabauw_MERRA_wspd-iqrln_hist_wspd-iqrln12-1.dat",-1,
r = spcorr(x,y)  ; r = 0.6188006    ; r*r=0.3829142
Just for 'fun', I also tried
i = ind( .not.(x.eq.0 .and. y.eq.0) )
R = spcorr(x(i),y(i))   ; R = 0.3943804   ; R*R = 0.1555359
====
I speculate it must have something to do with the
many ties [ x(n)=y(n)=0 ]. There is nothing in
the fortran code to indicate ties are an issue.
====
The R code shows no rank correlation. Is that what you expect?
_______________________________________________
ncl-talk mailing list
List instructions, subscriber options, unsubscribe:
http://mailman.ucar.edu/mailman/listinfo/ncl-talk
Received on Mon Feb 21 13:45:12 2011

This archive was generated by hypermail 2.1.8 : Wed Feb 23 2011 - 16:47:57 MST