Re: Creating a new variable of indeterminate length

From: Jonathan Vigh <jvigh_at_nyahnyahspammersnyahnyah>
Date: Wed Sep 29 2010 - 14:49:24 MDT

  Hi Chris,
     I'm wasn't sure from your e-mail if your data files contain data
from just one weather station per file, or if each file contains data
from all the weather stations, but just at one time period.

If it's the first case, one approach you could use would be to read the
data variables for each meteorological station into attribute arrays
that are specific to that station and variable, then later you can copy
this into regular arrays, either filling in gaps with interpolation or
setting missing values. Storing the data in attribute arrays attached to
a container has the very nice advantage of being able to grow the arrays
as you need, without knowing in advance how long they will have to be.
You can also handle the problem of missing values, say for instance, if
an entire ob happened to be missing in the data file and you need to put
missing values into your array in its place. If the obs are out of
order, it's easy to sort them using the time coordinate you construct as
you read the data in.

The downside is that this is quite complex and may not at all be easy to
actually implement. I used this approach to read a particularly nasty
hurricane data format that I think was never intended to be read by
computers. My problem was very complex because I did not know the record
length in advance, the data format was very difficult to parse and
quality control, records were out of order and existed on multiple and
nonregular times, and there were 200+ variables to read. There is also
an efficiency hit to parsing things line by line and using these
attribute arrays. I think 100,000 lines of data took, or a million data
points, took something like an hour or two to read, parse, and process.
So if you're working with millions of lines, this approach may not be
good for you - the attribute arrays can slow things down quite a bit.
They also can't store individual metadata for each attribute array (like
_FillValue) like a variable can. So this method might not be feasible
for you in that case. On the other hand, this might be one of the only
solutions in certain difficult data processing situations. Certainly,
when NCL gets an "eval" statement, some of these issues could be handled
by actual variables.

Assuming your data isn't in some easy delimited format so that actual
parsing is required to read the obs at each time, the program flow would
be something like:

>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
begin
....
; create a boolean container to which the attribute arrays will be attached
   DATA = True

   do ifile = 0, nfiles-1

; open file with data for first station and read in as list of strings
(one per line) - if you're data are one set of observations for one time
period per line, then you'll have to parse through this, or if the file
is more structured, you may need to read it in a different way

      datalines = addfile(directory + filenames(ifile),-1,"string")
      nlines = dimsizes(datalines)
      nvalid_obs = 0 ; start a counter of valid obs

      do iline = 0, nlines-1

; read the data for a given time into temporary variables: pressure,
temperature, windspeed, etc. (if the data were separated by spaces or
another delimiter, you could use some of NCL's string parsing routines,
conversion to floats or integers may be desired, or you could store
these as strings)

; read YYYY, MM, DD, HH, MM, pressure, temperature, windpseed, etc.
; you'll probably need to do checks to fill in the appropriate missing
values
; if valid data is present, increment a counter
         nvalid_obs = nvalid_obs + 1

; create a time coordinate based on the synoptic time of these observations
         obtime = YYYY+MM+DD+HH+MM ; or whatever - if minutes and
seconds are involved, you could create a time coordinate based on
seconds since 00:00-1/1/1970 or something

; do quality checks on data (if necessary)

; now store each of the variables for that particular time in the data
container
         grow_attribute(DATA,"stationA_obtime",obtime)
         grow_attribute(DATA,"stationA_pressure",pressure)
         grow_attribute(DATA,"stationA_temperature",temperature)
         grow_attribute(DATA,"stationA_windspeed",windspeed)

; delete temporary variables to ensure their values don't get used again
        delete(obtime)
        delete(pressure)
        delete(temperature)
        delete(windspeed)
....
   end do ; (end loop over ilines)

end do ; (end loop over ifiles)

; Then you have all the data present and you can put them into more
regular variables as you desire and do other stuff
....
end
>>>>>>>>>>>>>>>>>>>>

Note that if you have many many variables, you can use generalized
notation for accessing all of the attribute arrays later
The following example takes these and stores them into a new container

     NEW_DATA = True
     variablenames_attribute_list = getvaratts(DATA)
     natts = dimsizes(variablenames_attribute_list)
     do iatt = 0, natts-1
        
grow_attribute(NEW_DATA,"NEW_DATA_"+variablenames_attribute_list(iatt),DATA@$variablenames_attribute_list(iatt)$)
     end do
     delete(variablenames_attribute_list)

Anyway, I hope this is helpful, or at least gives you an idea on a
possible way forward. Perhaps you can just use the attribute arrays in a
simpler way than I've outlined for the initial data read and bypass all
the complexity I had to deal with.

Good luck,
     Jonathan

;----------------------------------------------------------------------
; This procedure will cause an existing attribute to grow
; in size, or else add it if it doesn't exist.
;
; It is a useful function for code to read CSV ASCII files.
;
; Author: Mary Haley
;----------------------------------------------------------------------
undef("grow_attribute")
procedure grow_attribute(tmp,att_name,att_val)

local nold,nnew,new_att_val

begin
   if(isatt(tmp,att_name)) then
     nold = dimsizes(tmp@$att_name$) ; # of old values
     nnew = dimsizes(att_val) ; # of new values to add

; The old and new attribute values better be the same type,
; or at least coercible to the same type.
     new_att_val = new(nold+nnew,typeof(tmp@$att_name$))
     new_att_val(0:nold-1) = tmp@$att_name$
     new_att_val(nold:) = att_val

; Create new npts_list and reattach to temp variable.
     delete(tmp@$att_name$)
     tmp@$att_name$ = new_att_val
   else

; Create the attribute.
     tmp@$att_name$ = att_val
   end if
end

On 09/29/2010 02:31 AM, Christopher Steele wrote:
> Hi everyone,
>
> I'm trying to read in a list of meteorological station files and
> extract the relevant data from them as one continuous time series.
> However, due to the nature of meteorological stations, not all report
> data at each time interval so I'm left with data files of varying
> lengths. Ordinarily, I would create a file list, preassign a new
> variable, and create a loop over the files and extract the data into
> the new variable so that I'm left with one continuous data set to work
> with. Unfortunately, I cannot do this if the length of the files are
> indeterminate. Short of going back to the original data files (of
> which there are many) and ensuring that all stations that do not
> report produce a nan, is there a way around this, or even a more
> efficient way of reading in data from multiple files?
>
> cheers
>
> Chris
>
>
> _______________________________________________
> ncl-talk mailing list
> List instructions, subscriber options, unsubscribe:
> http://mailman.ucar.edu/mailman/listinfo/ncl-talk

_______________________________________________
ncl-talk mailing list
List instructions, subscriber options, unsubscribe:
http://mailman.ucar.edu/mailman/listinfo/ncl-talk
Received on Wed Sep 29 14:49:36 2010

This archive was generated by hypermail 2.1.8 : Mon Oct 04 2010 - 08:55:54 MDT