Re: Re: default Fill/missing values

From: Mary Haley <haley_at_nyahnyahspammersnyahnyah>
Date: Tue, 22 Apr 2008 09:52:17 -0600 (MDT)

Jamie,

You bring up a good point about people having binary files that
may have a default missing value of "-999." hard-coded in.

Another possible issue is with folks that are explicitly checking
their float data against "-999." to test if there are missing values,
rather than using the more portable method of using the "_FillValue"
attribute.

The NCL team strives to keep things as backwards compatible as
possible. Changes like this will only be done after careful
consideration and lots of discussion.

One possible route is to do this in several phases.

We can start by creating a procedure that allows you to define which
set of missing values are the default for that NCL script. (The
default would initially still be "-999." for float values.) One could
then call this procedure to switch to whatever new default missing
values we are planning on for floats/doubles. This will allow you to
test the new default values with your current scripts.

After a release or two, if things are going well, then we can switch
to the new default missing value system, but still keep the procedure
around if somebody needs to switch back to the old default missing
value system.

I need to discuss this with the rest of the team first. :-)

We still would like to hear input from you folks on this topic. Feel
free to send your comments to me directly, if you don't want to post
to the whole group, and I will make sure they get forwarded to the
developers.

--Mary

On Mon, 21 Apr 2008, Jamie Scott wrote:

> Dave,
>
> I think this is a great idea. I'm glad you are working on this issue.
>
> I do suppose this could break someone's code, if they have binary files that
> use the default _FillValue and they don't specify the _FillValue in their
> code.
> This could be the case with netCDF files that don't have the appropriate
> attributes as well.
>
> I like the idea of using the maximum possible value for int, short, long
> as is done in the netCDF standard. To maintain platform independence,
> it's a good idea to use the 32bit maximum values.
>
> One of the main reasons for doing this is for representing floating point
> data as short integers with a scale and offset.
> When scaling the data, you want to use as much of the range of the short
> integers as
> possible in order to preserve the precision of the original floating point
> representation.
>
> For floating point value, I think 1.e20 is quite large enough, although I'd
> say 1.e30 is more common.
> I see -1.e10 a lot too. I've never had trouble with any of these values
> getting in the way of real data values.
>
>
>
> Jamie Scott
> NOAA/ESRL/PSD
> james.d.scott_at_noaa.gov
>
> On Apr 18, 2008, at 5:41 PM, ncl-talk-request_at_ucar.edu wrote:
>
>> From: David Brown <dbrown_at_ucar.edu>
>> Subject: NCL default missing values
>> To: ncl-talk forum <ncl-talk_at_ucar.edu>
>> Message-ID: <3A33B9D4-2110-47C4-8551-6F9BFA0F339F_at_ucar.edu>
>> Content-Type: text/plain; charset=US-ASCII; delsp=yes; format=flowed
>>
>>
>> Hi NCL users,
>>
>> We are tentatively planning to modify NCL's default fill values in
>> order to put them further
>> outside the range of "normal" data sets and computations. This will
>> apply to variables defined in NCL
>> as well as to variables defined in GRIB files as presented by NCL.
>> The motivating factor for this
>> change is that recently we have encountered GRIB data where the data
>> range includes the value used
>> for float type fill value (-999), leading to a situation where a few
>> valid values are treated as missing.
>> Of course this has always been a possibility but has not been
>> encountered in practice, or at least,
>> it has not jumped out as a problem in our tests or reported as a bug
>> by any users until now. HDF and
>> NetCDF file variables will not be affected because the NCL
>> representation of the variable only contains
>> a _FillValue attribute if it is defined in the file.
>>
>> We only want to do this once so we would like to get input on the
>> best values to use, as well as
>> feedback on possible problems to existing code.
>>
>> Our plan is to change the default fill values for float and double
>> values to the value 1.0e20.
>>
>> The byte, character, and and string missing values (255, inttochar
>> (0), and "missing") will not change.
>>
>> Note that the long type changes size between 32 and 64 bit machines.
>> On 64 bit machines the long type can hold much bigger
>> values, but I think we want to ensure that the default long fill
>> value will always be the same.
>> Therefore the default long fill value and the default integer fill
>> value will probably be equal.
>>
>> For int, long and short we could follow the example of NetCDF and use
>>
>> long and int:
>> -2147283647 (maximum acceptable INT_MIN according to IEEE std.
>> 1003.1, 2004 edition)
>> short
>> -32767 (maximum acceptable SHRT_MIN)
>>
>> but it could certainly be argued that something with all '9''s such
>> as +/-999999999 (for long and integer) and (+/-9999) for short
>> would be easier to type, remember, and recognize when visually
>> scanning the contents of a variable.
>>
>> Other suggestions and arguments pro or con any alternatives are welcome.
>>
>> FYI, we are planning soon to add more integer types including
>> especially a 64-bit integer type, but also specifically unsigned
>> versions of all the types.
>> Our decisions concerning fill values for the existing types will be
>> extrapolated to come up with fill values for these new types.
>>
>> -dave
>
>
_______________________________________________
ncl-talk mailing list
ncl-talk_at_ucar.edu
http://mailman.ucar.edu/mailman/listinfo/ncl-talk
Received on Tue Apr 22 2008 - 09:52:17 MDT

This archive was generated by hypermail 2.2.0 : Tue Apr 22 2008 - 09:55:07 MDT