From: Dennis Shea <shea_at_nyahnyahspammersnyahnyah>

Date: Tue Feb 22 2011 - 09:55:29 MST

Date: Tue Feb 22 2011 - 09:55:29 MST

Hi Darren

http://www.answers.com/topic/spearman-s-rank-correlation-coefficient

If there are no tied ranks, then ρ is given by:[1][2]

\rho = 1- {\frac {6 \sum d_i^2}{n(n^2 - 1)}}.

If tied ranks exist, Pearson's correlation coefficient between ranks

should be used for the calculation[1]:

r = \frac{\sum_i(x_i-\bar{x})(y_i-\bar{y})}{\sqrt{\sum_i

(x_i-\bar{x})^2 \sum_i(y_i-\bar{y})^2}}.

One has to assign the same rank to each of the equal values. It is an

average of their positions in the ascending order of the values.

Read more:

http://www.answers.com/topic/spearman-s-rank-correlation-coefficient#ixzz1EhqkWAmX

===============

http://support.sas.com/documentation/cdl/en/procstat/63104/HTML/default/viewer.htm#procstat_corr_sect014.htm

This [SAS] also notes "PROC CORR computes the Spearman correlation by

ranking the data and using the ranks in the Pearson product-moment

correlation formula. In case of ties, the averaged ranks are used."

I do not see any averaging of ranks in the fortran code used

by NCL. Hence, I *speculate* the R result might take into

account "averaged ranks". Do u have the source code

used by "R" ?

D

On 2/21/11 1:45 PM, Daran Rife wrote:

*> Hi Dennis,
*

*>
*

*> Thanks for the additional information and results. Emilie has gone on
*

*> maternity leave so you're not likely to hear from her for a while. We
*

*> have been working this issue together. The R code to calculate the
*

*> Spearman rank correlation looks like this:
*

*>
*

*> x<- read.table("data_Cabauw_MERRA_wspd-iqrln_full1.dat")
*

*> y<- read.table("data_Cabauw_MERRA_wspd-iqrln_hist_wspd-iqrln-12-1.dat")
*

*> spear<- cor(x$V1,y$V1, method="spearman)
*

*> print(spear)
*

*> [1] 0.07144465
*

*> spear^2
*

*> [1] 0.005104339
*

*>
*

*> I have the same suspicion about how the two methods deal with ties when
*

*> ranking the data before computing the correlation. Our data have many,
*

*> many non-zero values whose frequency is the identical. Thus, there are
*

*> many ties when the frequencies are ranked. Like you, I see nothing in
*

*> NCL's Fortran-based routine that indicates ties are an issue.
*

*>
*

*> The R documentation states "Spearman's rho statistic is used to
*

*> estimate a
*

*> rank-based measure of association. These are more robust and have been
*

*> recommended if the data do not necessarily come from a bivariate normal
*

*> distribution."
*

*>
*

*> And I emulated your experiment by removing the non-zero elements.
*

*>
*

*> ind<- which(x != 0& y != 0)
*

*> spear_nonzero<- cor(x$V1[ind], y$V1[ind], method="spearman")
*

*> Warning message:
*

*> In cor(x$V1[ind], y$V1[ind], method = "spearman") :
*

*> the standard deviation is zero
*

*>
*

*> The correlation is NA, because x$V1[ind] and y$V1[ind] are perfectly
*

*> correlated, which can be readily seen by looking at x$V1[ind] and
*

*> y$V1[ind].
*

*>
*

*> > x$V1[ind]
*

*> [1] 0.03042288 0.03042288 0.03042288 0.03042288 0.03042288 0.03042288
*

*> [7] 0.03042288 0.03042288 0.03042288 0.03042288 0.03042288 0.03042288
*

*> > y$V1[ind]
*

*> [1] 0.03042288 0.03042288 0.03042288 0.03042288 0.03042288 0.03042288
*

*> [7] 0.03042288 0.03042288 0.03042288 0.03042288 0.03042288 0.03042288
*

*>
*

*> This again, is very different that what NCL's spcorr function returns.
*

*> I assume you intended to remove all non-zero values, but your function
*

*> doesn't appear to do this.
*

*>
*

*> i = ind( .not.(x.eq.0 .and. y.eq.0) )
*

*>
*

*> print(y(i))
*

*>
*

*> (0) 0
*

*> (1) 0
*

*> (2) 0
*

*> (3) 0.03042288
*

*> (4) 0.03042288
*

*> (5) 0
*

*> (6) 0
*

*> (7) 0
*

*> (8) 0
*

*> (9) 0
*

*> (10) 0
*

*> (11) 0
*

*> (12) 0
*

*> (13) 0
*

*> (14) 0
*

*> (15) 0
*

*> (16) 0
*

*> (17) 0
*

*> .
*

*> .
*

*> .
*

*>
*

*> I tried to modify your statement to select only non-zero elements from
*

*> both x and y but I get the following error, which clearly indicates
*

*> that there are a different number of non-zero elements in x and y.
*

*>
*

*> i = ind(x.ne.0 .and. y.ne.0)
*

*> fatal:Dimension sizes of left hand side and right hand side of
*

*> assignment do not match
*

*> fatal:Execute: Error occurred at or near line 40
*

*>
*

*> Because I am not an NCL expert, I don't know how to compute the inter-
*

*> section between the non-zero element indices in x and y. In any case,
*

*> thanks for looking into this further.
*

*>
*

*>
*

*> Sincerely,
*

*>
*

*>
*

*> Daran
*

*> --
*

*>
*

*> Message: 1
*

*> Date: Sun, 20 Feb 2011 14:05:09 -0700
*

*> From: Dennis Shea<shea@ucar.edu>
*

*> Subject: Re: [ncl-talk] A problem with the Spearman rank correlation
*

*> function (spcorr)
*

*> To: Emilie Vanvyve<evanvyve@ucar.edu>
*

*> Cc: ncl-talk@ucar.edu
*

*> Message-ID:<4D618205.2030200@ucar.edu>
*

*> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
*

*>
*

*> x = asciiread("data_Cabauw_MERRA_wspd-iqrln_full1.dat",-1,"float")
*

*> y = asciiread("data_Cabauw_MERRA_wspd-iqrln_hist_wspd-iqrln12-1.dat",-1,
*

*> r = spcorr(x,y) ; r = 0.6188006 ; r*r=0.3829142
*

*>
*

*> Just for 'fun', I also tried
*

*>
*

*> i = ind( .not.(x.eq.0 .and. y.eq.0) )
*

*> R = spcorr(x(i),y(i)) ; R = 0.3943804 ; R*R = 0.1555359
*

*> ====
*

*>
*

*> I speculate it must have something to do with the
*

*> many ties [ x(n)=y(n)=0 ]. There is nothing in
*

*> the fortran code to indicate ties are an issue.
*

*>
*

*> ====
*

*> The R code shows no rank correlation. Is that what you expect?
*

*> _______________________________________________
*

*> ncl-talk mailing list
*

*> List instructions, subscriber options, unsubscribe:
*

*> http://mailman.ucar.edu/mailman/listinfo/ncl-talk
*

_______________________________________________

ncl-talk mailing list

List instructions, subscriber options, unsubscribe:

http://mailman.ucar.edu/mailman/listinfo/ncl-talk

Received on Tue Feb 22 09:55:36 2011

*
This archive was generated by hypermail 2.1.8
: Wed Feb 23 2011 - 16:47:57 MST
*