From: Daran Rife <drife_at_nyahnyahspammersnyahnyah>

Date: Tue Feb 22 2011 - 10:49:03 MST

Date: Tue Feb 22 2011 - 10:49:03 MST

Hi Dennis,

I'll spare you the rather lengthy R source code for the cor() function

(there is a heap of error checking), but I can tell you that when the

cor(x,y,method="spearman") is called, it calls another intrinsic R

function named "rank" to rank the values. By default, rank computes the

average "position" when there are ties in the data, and cor() uses the

default behavior of rank, so that must be the difference between the two

methods.

Sorry I forgot to include details about the rho metric, but I see you

found it. This is the most commonly used technique by statisticians.

\rho = 1- {\frac {6 \sum d_i^2}{n(n^2 - 1)}}

Thanks again for helping us sort out the differences. NCL is definitely

an integral part of our tool suite, as is R, Python, NCO, and CDO, so

its helpful to understand where the various tools might differ.

Appreciate your attention to detail and your sticking with this unusual

request for help.

Daran

-- On Feb 22, 2011, at 9:55 AM, Dennis Shea wrote: > > Hi Darren > > > http://www.answers.com/topic/spearman-s-rank-correlation-coefficient > > If there are no tied ranks, then ρ is given by:[1][2] > > \rho = 1- {\frac {6 \sum d_i^2}{n(n^2 - 1)}}. > > If tied ranks exist, Pearson's correlation coefficient between ranks should be used for the calculation[1]: > > r = \frac{\sum_i(x_i-\bar{x})(y_i-\bar{y})}{\sqrt{\sum_i (x_i-\bar{x})^2 \sum_i(y_i-\bar{y})^2}}. > > One has to assign the same rank to each of the equal values. It is an average of their positions in the ascending order of the values. > > > Read more: http://www.answers.com/topic/spearman-s-rank-correlation-coefficient#ixzz1EhqkWAmX > > =============== > http://support.sas.com/documentation/cdl/en/procstat/63104/HTML/default/viewer.htm#procstat_corr_sect014.htm > > This [SAS] also notes "PROC CORR computes the Spearman correlation by ranking the data and using the ranks in the Pearson product-moment correlation formula. In case of ties, the averaged ranks are used." > > I do not see any averaging of ranks in the fortran code used > by NCL. Hence, I *speculate* the R result might take into > account "averaged ranks". Do u have the source code > used by "R" ? > > D > > > > On 2/21/11 1:45 PM, Daran Rife wrote: >> Hi Dennis, >> >> Thanks for the additional information and results. Emilie has gone on >> maternity leave so you're not likely to hear from her for a while. We >> have been working this issue together. The R code to calculate the >> Spearman rank correlation looks like this: >> >> x<- read.table("data_Cabauw_MERRA_wspd-iqrln_full1.dat") >> y<- read.table("data_Cabauw_MERRA_wspd-iqrln_hist_wspd-iqrln-12-1.dat") >> spear<- cor(x$V1,y$V1, method="spearman) >> print(spear) >> [1] 0.07144465 >> spear^2 >> [1] 0.005104339 >> >> I have the same suspicion about how the two methods deal with ties when >> ranking the data before computing the correlation. Our data have many, >> many non-zero values whose frequency is the identical. Thus, there are >> many ties when the frequencies are ranked. Like you, I see nothing in >> NCL's Fortran-based routine that indicates ties are an issue. >> >> The R documentation states "Spearman's rho statistic is used to >> estimate a >> rank-based measure of association. These are more robust and have been >> recommended if the data do not necessarily come from a bivariate normal >> distribution." >> >> And I emulated your experiment by removing the non-zero elements. >> >> ind<- which(x != 0& y != 0) >> spear_nonzero<- cor(x$V1[ind], y$V1[ind], method="spearman") >> Warning message: >> In cor(x$V1[ind], y$V1[ind], method = "spearman") : >> the standard deviation is zero >> >> The correlation is NA, because x$V1[ind] and y$V1[ind] are perfectly >> correlated, which can be readily seen by looking at x$V1[ind] and >> y$V1[ind]. >> >> > x$V1[ind] >> [1] 0.03042288 0.03042288 0.03042288 0.03042288 0.03042288 0.03042288 >> [7] 0.03042288 0.03042288 0.03042288 0.03042288 0.03042288 0.03042288 >> > y$V1[ind] >> [1] 0.03042288 0.03042288 0.03042288 0.03042288 0.03042288 0.03042288 >> [7] 0.03042288 0.03042288 0.03042288 0.03042288 0.03042288 0.03042288 >> >> This again, is very different that what NCL's spcorr function returns. >> I assume you intended to remove all non-zero values, but your function >> doesn't appear to do this. >> >> i = ind( .not.(x.eq.0 .and. y.eq.0) ) >> >> print(y(i)) >> >> (0) 0 >> (1) 0 >> (2) 0 >> (3) 0.03042288 >> (4) 0.03042288 >> (5) 0 >> (6) 0 >> (7) 0 >> (8) 0 >> (9) 0 >> (10) 0 >> (11) 0 >> (12) 0 >> (13) 0 >> (14) 0 >> (15) 0 >> (16) 0 >> (17) 0 >> . >> . >> . >> >> I tried to modify your statement to select only non-zero elements from >> both x and y but I get the following error, which clearly indicates >> that there are a different number of non-zero elements in x and y. >> >> i = ind(x.ne.0 .and. y.ne.0) >> fatal:Dimension sizes of left hand side and right hand side of >> assignment do not match >> fatal:Execute: Error occurred at or near line 40 >> >> Because I am not an NCL expert, I don't know how to compute the inter- >> section between the non-zero element indices in x and y. In any case, >> thanks for looking into this further. >> >> >> Sincerely, >> >> >> Daran >> -- >> >> Message: 1 >> Date: Sun, 20 Feb 2011 14:05:09 -0700 >> From: Dennis Shea<shea@ucar.edu> >> Subject: Re: [ncl-talk] A problem with the Spearman rank correlation >> function (spcorr) >> To: Emilie Vanvyve<evanvyve@ucar.edu> >> Cc: ncl-talk@ucar.edu >> Message-ID:<4D618205.2030200@ucar.edu> >> Content-Type: text/plain; charset=ISO-8859-1; format=flowed >> >> x = asciiread("data_Cabauw_MERRA_wspd-iqrln_full1.dat",-1,"float") >> y = asciiread("data_Cabauw_MERRA_wspd-iqrln_hist_wspd-iqrln12-1.dat",-1, >> r = spcorr(x,y) ; r = 0.6188006 ; r*r=0.3829142 >> >> Just for 'fun', I also tried >> >> i = ind( .not.(x.eq.0 .and. y.eq.0) ) >> R = spcorr(x(i),y(i)) ; R = 0.3943804 ; R*R = 0.1555359 >> ==== >> >> I speculate it must have something to do with the >> many ties [ x(n)=y(n)=0 ]. There is nothing in >> the fortran code to indicate ties are an issue. >> >> ==== >> The R code shows no rank correlation. Is that what you expect? >> _______________________________________________ >> ncl-talk mailing list >> List instructions, subscriber options, unsubscribe: >> http://mailman.ucar.edu/mailman/listinfo/ncl-talk _______________________________________________ ncl-talk mailing list List instructions, subscriber options, unsubscribe: http://mailman.ucar.edu/mailman/listinfo/ncl-talkReceived on Tue Feb 22 10:49:09 2011

*
This archive was generated by hypermail 2.1.8
: Wed Feb 23 2011 - 16:47:57 MST
*