Re: the difference between new eof function and old eof function

From: Dennis Shea (shea AT cgd.ucar.edu)
Date: Thu Feb 24 2005 - 16:03:39 MST


Hello,
>
>I found a problem when I made EOF analysis with NCL eof function.
>
>The following is my ncl programm with old version:
>***********************************************************
> load "$NCARG_ROOT/lib/ncarg/nclscripts/csm/contributed.ncl"
>
>begin
> f = addfile("sst_50-03_original.nc","r")
>
> ss =f->sano
>
>;begin eof via co-variance matrix
>
> neval = 4
> eof = eofcov(ss,neval)
>
> pcvar = eof@pcvar
> print(pcvar)
>
> end
>************************************************************
>output results:
>Variable: pcvar
>Type: float
>Total Size: 16 bytes
> 4 values
>Number of Dimensions: 1
>Dimensions and sizes: [4]
>Coordinates:
>(0) 26.49188
>(1) 15.10827
>(2) 8.423062
>(3) 6.310274
>
>______________________________________________________________________
>My programm with new version is as follows:
>***********************************************************************
> load "$NCARG_ROOT/lib/ncarg/nclscripts/csm/contributed.ncl"
>
>begin
> f = addfile("sst_50-03_original.nc","r")
>
> ss =f->sano
>
>;begin eof via co-variance matrix
>
> neval = 4
> eof = eofunc(ss,neval,False)
> pcvar = eof@pcvar
> print(pcvar)
>
> end
>**********************************************************************
>result is:
>Variable: pcvar
>Type: float
>Total Size: 16 bytes
> 4 values
>Number of Dimensions: 1
>Dimensions and sizes: [4]
>Coordinates:
>(0) 23.71038
>(1) 16.73346
>(2) 7.300104
>(3) 7.019832
>___________________________________________________________________________
>
>Their results are different, which influence my analysis because if old version
is right the first 2 modes are significant by North test (1982), however if new
version is right, the first 2 modes are not significant.
>
>Could you help me? I prefered new version because the computing speed is very
fast for large matrix. And many of my results had been based on the new version.
But the result of old version seem more reasonable in physics( same with fortran
programm result).
>
>If you need the nc dataset, I can transfer to you.
>
>Looking forward to your answer!
>
>Jing
>
>2005-02-24
>
>======= 2005-02-16 14:19:00 您在来信中写道:=======
>
>>Hi Jing,
>>
>>There is a way to do this, and it was purposely unadvertised. I'm
>>CC-ing Dennis Shea on this message, because he would have more
>>information about what these eigenvalues mean.
>>
>>The third argument to "eofunc" is an option variable. Set this
>>variable to True, and then set an attribute of this variable called
>>"return_eval" to True:
>>
>> option = True
>> option@return_eval = True
>> ...
>> ev_cor = eofunc(x,neval,option)
>>
>>
>>This will return the eigenvalues as an attribute of the "ev_cor"
>>variable called "eval".
>>
>>--Mary
>>
>>
>>On Wed, 16 Feb 2005, Jing wrote:
>>
>>> Hi,all
>>>
>>> I wonder whether there is an attribution representing "eigenvalues" in the
new EOF function "eofunc".
>>>
>>> In old EOF function, such as "eofcov", I knew its output attribute "eval" is
just "eigenvalues" (1D array). But now I failed to find the attribute in new EOF
function. Could you tell me how to get "eigenvalues" from the new version for
EOF analysis?
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Hi Jing,

The answer is rather long and I am quite busy
so I can't elaborate too much.

[1]
You can get the eigenvalues returned.
The following is not documented for assorted reasons.

opt = True
opt@return_eval = True
EVEC = eofunc(ss,neval,opt)
print ("EVEC@eval " + EVEC@eval )

[2]
Given the small differences in % variance explained by the
new and old functions, I would be very surprised if the first
two EOFs are not significant under the new eofunc routine.

[3]
Why the new eof function? This is where things
can require a long explanation. The short answers are:

    [a] technical correctness
    [b] possible *major* reductions in execution time.
    
Let S be the number of Spatial points [ie, grid points] and T
be the number of Temporal pts. Most often users want the
EOFs of the spatial pattern. Historically, people calculate
a spatial covariance or correlation matrix that is of size SxS
[NCL takes advantage of the symmetric nature of the covariance
matrix and computes an (S+1)*S/2 matrix.] This is then
passed onto an LAPACK routine that calculates S eigenvalues.
Prior to returning, the neval spatial EOFs are calculated
and returned to the calling script.

The issue is that given S spatial pts and T temporal values the
theoretical maximum number of unique eigenvalues is
the min(S,T). So if S=8000 and T=10 then the max number
of unique eigenvalues is 10. You can calculate more
but it can take much longer to do so and they are basically
just noise.

When an SxS covariance matrix is passed onto the
LAPACK routine, it has no knowledge of the value of T. Hence,
it calculates S eigenvalues to explain the total variance.
 
Luckily, in the fields 'we' deal with
the variance is usually concentrated in the lower modes
and drops off rapidly to some more-or-less constant
noise level that is small. [This is called a "red" process.]

In all previous versions of NCL, the SxS spatial covariance
matrix was calculated and the spatial EOFs were determined
directly [eofcov and eofcor].

When the value of S was 'small' the speed issue was generally not
a concern. Even when S was not so small, some knowledgable
people would take advantage of the fact that adjacent grid
points are generally not independent and would use NCL's
array syntax to subset the source data. If ss(lat,lon,time)

      eof = eofcov_Wrap(::2, ::4 ,:) ; every other lat, every 4th lon
                                     ; all time steps
                                     ; (the subsetting should be done
                                     ; based on physical reasoning)
      
Statistically speaking, the results are similar although
the numbers might be slightly different.

---
Lately, people have been calculting EOFs of some *very large* matrices.
The computational times were sometimes *hours* when S was very large.

To address this issue, a new approach was used.

The Eckart-Young Theorem states that the max number of unique eigenvalues is min[S,T]. There are too many details to discuss so I'll just jump to the bottom line. Rather than directly compute the SxS covariance matrix and solve for the spatial EOFs, the NCL function, eofunc, compares the sizes of S and T. If S<T then it computes the spatial matrix and computes the spatial EOFs directly. However, if T<S then it computes a TxT matrix and then uses a transformation to reconstruct the spatial EOFs. Note: the same LAPACK routine is used.

The reason why we did not include the eigenvalues in the return is that if people (like you) who looked at the eigenvalues of the eofcov/eofcor routines they might differ *substantially* [in a numerical sense] from the eigenvalues via the transformed matrix. We thought this might be confusing.

--- Guess it was longer than I anticipated. I'll try to answer any other questions you might have.

good luck D

[P.S. I suggest that users *always* use the _Wrap versions of NCL built-in functions when available.]

_______________________________________________ ncl-talk mailing list ncl-talk@ucar.edu http://mailman.ucar.edu/mailman/listinfo/ncl-talk



This archive was generated by hypermail 2b29 : Wed Mar 02 2005 - 09:55:22 MST