Re: Byte array

From: David Brown <dbrown_at_nyahnyahspammersnyahnyah>
Date: Tue Jul 30 2013 - 15:02:55 MDT

HI Phil,
Thanks for the explanation and the reference from the hdfgroup site, and I apologize for suggesting possible confusion on your part. This does pretty much confirm what I was suspecting: that the concept of fill value has a different meaning in the HDFEOS CloudSat world than we are used to in our NCL and NetCDF-centric world. We do not usually make a big distinction between fill value and missing value. As with HDFEOS, the fill value is used as an initialization value for undefined data variables, and, just as with HDFEOS, there are ways to turn off the initialization step for performance reasons. But beyond serving as a initialization value, the _FillValue attribute value usually also serves as the missing value indicator.
ncdump does not write the '_' character to the file (it does no file writing). As it dumps the contents of a variable, it checks each value for equality with the _FillValue, and, if equal, it outputs the '_' character in place of the value.
In a somewhat analogous fashion, NCL has special handling for values equal to the _FillValue in arrays: during operations such as addition, a _FillValue in either operand short circuits the operation and propagates the _FillValue to that element of the result.

In the case of the Data_quality flag on the other hand, it seems the value returned from SDgetfillvalue is intended simply as an initialization value rather than as an invalid value. In fact it turns out to be the value that says the data is valid, which kind of turns our normal approach on its head. This is good to know. I think our approach is ingrained enough in our community that it is not going to change. So we will need to document the differing approach used by your group and presumably other HDFEOS users.

Fortunately, once understood, it is easy to compensate. NCL users who want to use the Data_quality flag or other variables with similar semantics can simply delete the _FillValue attribute, allowing the '0' value to be accessed as a normal value. At this point, I do not see a way for NCL to be able to solve this issue automatically, although if anyone has suggestions, we would be happy to consider them. It seems that it is just a case of two different models that do not quite mesh.
 -dave
 

On Jul 30, 2013, at 1:01 PM, "Partain,Philip" <Philip.Partain@colostate.edu> wrote:

> Hi Marston,
>
> The fill value is used to fill an array of data when reading a field that's been defined in an HDF file but not actually written to the file. This can be useful or necessary when data fields are written to files in multiple stages or in a non-contiguous manner. The fill value doesn't have to be a missing value when all of the fields are written to every file as they are with CloudSat products. In a completed file all data have been filled in so there shouldn't be a need to read or use the fill value. At least that was the convention eons ago when CloudSat decided to use HDF-EOS 2.5 (based on HDF 4.1). Seehttp://www.hdfgroup.org/training/HDFtraining/UsersGuide/SDS_SD.fm9.html#13948. Perhaps newer specifications treat fill values differently?
>
> It seems strange to me that ncdump would overwrite field values that have allocated and written to the file with "_" when it equals the fill value. If you are treating it as a missing value then usually you want that value to remain in the array. Especially when testing for missing values or so you don't have to read a string when you expect a number. Though I'm sure there are use cases and details I'm not familiar with.
>
> The Data_quality field in 2C-ICE doesn't have a missing value. It is a bit flag that indicates why data elements (profiles) in other fields are missing. So, when filtering out missing CloudSat data don't use the fill value. First, skip profiles where Data_quality is not equal to zero. Or examine the Data_quality bits if you want to get specific. This indicates data that might be bad as a result of spacecraft maneuvers or abnormal radar operation. Then skip 2C-ICE fields that equal the documented missing values for those fields. For example, EXT_coef should be skipped if it is less than or equal to -7777. This indicates a failure by the 2C-ICE algorithm to retrieve the value. The missing values are documented at http://www.cloudsat.cira.colostate.edu/dataSpecs.php?prodid=112.
>
> Hope this helps,
> Phil
>
> From: Marston Johnston [mailto:shejo284@gmail.com]
> Sent: Monday, July 29, 2013 7:08 PM
> To: Partain,Philip
> Cc: David Brown; ncl-talk@ucar.edu
> Subject: Re: Byte array
>
> Phil,
>
> Looks like something is not right with the HDF file or a misunderstanding of _FillValue. You can read the response from the NCL folks below. Perhaps you can have a look at one of the HDF files from 2C-ICE and respond to this email address, including the NCL group. I have little experience with byte arrays and I'm still new to NCL but even CDO and ncdump have problems with the byte array when the hdf file is converted and the Data_quality array is interpreted.
>
> /M
>
>
> ---------- Forwarded message ----------
> From: David Brown <dbrown@ucar.edu>
> Date: Tue, Jul 30, 2013 at 1:14 AM
> Subject: Re: Byte array
> To: Marston Johnston <shejo284@gmail.com>
> Cc: ncl-talk@ucar.edu
>
>
> Hi Marston,
> After looking at the file, which I got from Dennis, and stepping through the file to see how the Data_quality _FillValue gets set, I do not see that NCL is doing anything wrong.
> NCL calls the HDFEOS routine SWgetfillvalue for each swath-type data variable. In the case of the Data_quality variable the routine returns success and the byte-type value 0 for the fill value.
> I have verified that if there is no fill value for a variable (examples are Clutter_reduction_flag_2B_GEOPROF and SurfaceHeightBin_2B_GEOPROF in this file) then the routine returns -1 meaning failure.
>
> NCL is not overwriting any values with the _FillValue or modifying the data in any way; it seems to me that the responder to your question is confused or at least he has a different idea of what "fill value" means. He seems to be suggesting that having the flag set to the "fill value" 0 means that the data is good, while any other value means there is a problem. That is pretty much the opposite of what I would think is the common view of the meaning of _FillValue as an invalid value. However, I can see how that interpretation might come about in this case where any non-zero value is indicating a problem with other data, not the Data_quality variable itself. Perhaps they are thinking of fill value as something like the default value, before any issues are raised for particular elements.
>
> Concerning the flag values: I think " 0: RayStatus_validity not normal. " means that when bit 0 of the flag value is set to 1, the quality issue is "RayStatus_validity not normal".
> In other words, at that point it is talking about bit 0 of the flag, not the value of the flag as a whole.
>
> You can use the function dim_gbits (see the documentation) to extract individual bits from this variable.
> -dave
>
>
> On Jul 29, 2013, at 4:14 PM, Marston Johnston <shejo284@gmail.com> wrote:
>
> > Hi,
> >
> > The 2C-ICE folks replied that it seems as if NCL doesn't handle properly the byte array Data_quality when it reads it from the hdf file. Here is his response:
> >
> >> About the Data Quality question: does NCL overwrite the data quality flags with zero (the fill value) when converting hdf to netcdf? Even where the data quality flag is not zero? That would be a problem. When you have the original, correct values the easiest thing to do in any code that analyzes CloudSat data is to skip profiles that have a data quality value not equal to zero. If you want to check the bits then at least skip the profiles where bits 0, 5, and 6 are equal to one.
> > I'm not sure how to handle bit shift in NCL to check the individual bits. My original intention was to throw out all points with a flag value greater than 0.
> >
> > /M
> >
> > On Mon, Jul 29, 2013 at 10:08 PM, Marston Johnston <shejo284@gmail.com> wrote:
> > Hi Dave B.,
> >
> > Yes Dave, that is how I interpreted it, but the flags then say zero is a bad value: 0: RayStatus_validity not normal.
> >
> > I sure can upload a copy of the hdf file, but I sent one to Dennis about a week ago. Can you check on the FTP to see if there is a hdf file with a similar file name to: 2008001002019_08922_CS_2C-ICE_GRANULE_P_R04_E02.hdf? If you guys have cleaned the ftp, I can upload one right away.
> >
> > /M
> >
> >
> > On Mon, Jul 29, 2013 at 9:59 PM, David Brown <dbrown@ucar.edu> wrote:
> > Hi Marston,
> > Judging from the pdf doc that you pointed to, it looks like there should no missing value for the Data_quality variable. We could check to see if, for some reason,
> > NCL is mistakenly adding a _FillValue attribute. Can you send us a sample file so we can take a look?
> > Thanks.
> > -dave
> >
> >
> >
> > On Jul 29, 2013, at 1:45 PM, Marston Johnston <shejo284@gmail.com> wrote:
> >
> >> Hi Dave,
> >>
> >> Sorry for forgetting to CC the others. I'll do that henceforth.
> >> Will get back on this issue.
> >>
> >> /M
> >>
> >>
> >> On Mon, Jul 29, 2013 at 9:41 PM, Dave Allured - NOAA Affiliate <dave.allured@noaa.gov> wrote:
> >> Hi Marston.
> >>
> >> This is a good reply, thanks for doing all this investigation. Please
> >> remember to CC the user list on all technical replies, unless there is
> >> some good reason not to. I will treat this one as private, but would
> >> prefer not to keep going this way.
> >>
> >> I am looking forward to hear what the 2C-ICE people have to say.
> >>
> >> --Dave
> >>
> >> On Mon, Jul 29, 2013 at 1:24 PM, Marston Johnston <shejo284@gmail.com> wrote:
> >> > Hi Dave A.,
> >> >
> >> > This is a good point. I'm using NCL 6.1 which cannot convert the HDF4 files
> >> > to netcdf, therefore I read the hdf files twice, once with the "hdf" file
> >> > suffix and then again with "hdf.hdfeos" suffix. This enables NCL to get all
> >> > the variables from the HDF4 file. I'm working on a privately-owned linux
> >> > cluster so I only have access to the NCL version above. I got a hdf4
> >> > converter from the HDF group and converted the hdf4 file to netcdf. It is
> >> > this file that I did a netcdf dump on and got '-' for the _FillValue. When I
> >> > used CDO to read this netcdf, I get the following:
> >> >
> >> > 252 : 0000-00-00 00:00:00 0 37082 37082 :
> >> > nan : Data_quality
> >> >
> >> > this reads that all of Data_quality is set to nan.
> >> >
> >> > I have no software that can read the hdf4 files on the server.
> >> >
> >> > The Data_Quality doc reads a bit criptic:
> >> >
> >> > Name in file: Data_quality Source: 2B-GEOPROF 012
> >> > Field type (in file): UINT(1) Field type (in algorithm): INT(2) Dimensions:
> >> > nray
> >> >
> >> > Units: --
> >> >
> >> > Range: 0 to 127 Missing value:
> >> > Missing value operator: Factor: 1
> >> >
> >> > Offset: 0 MB: 0.035
> >> >
> >> > Flags indicating data quality. If 0, then data is of good quality.
> >> > Otherwise, treat as a bit field with 8 flags:
> >> >
> >> > 0: RayStatus_validity not normal.
> >> > 1: GPS data not valid. 2: Temperatures not valid. 3: Radar telemetry data
> >> > quality is not normal. 4: Peak power is not normal. 5: CPR calibration
> >> > maneuver. 6: Missing frame. 7: Not used.
> >> >
> >> > You can find this info in the DOC:
> >> > http://www.cloudsat.cira.colostate.edu/ICD/2C-ICE/2C-ICE_PDICD.P_R04.20111014.pdf
> >> >
> >> > I don't have any tools to do view hdfeos files. I'm in contact with the
> >> > 2C-ICE people on how to handle the Data_Quality. Waiting to hear back from
> >> > them.
> >> >
> >> >
> >> > /M
> >> >
> >> >
> >> > On Mon, Jul 29, 2013 at 8:42 PM, Dave Allured - NOAA Affiliate
> >> > <dave.allured@noaa.gov> wrote:
> >> >>
> >> >> Marston and Dave B,
> >> >>
> >> >> Marston originally said, "... in the CloudSat docs, the Data_quality
> >> >> flag is set to 0 if the data is good." If the _FillValue attribute is
> >> >> present and zero after NCL reads this hdf4-eos file, this is totally
> >> >> inconsistent with this documentation, and there is an error somewhere.
> >> >>
> >> >> Possibilities include a special case problem in the NCL driver, the
> >> >> hdf4-eos file was modified or produced under different rules, damaged
> >> >> file, mistake in the documentation, subtle problem in the user NCL
> >> >> script, etc.
> >> >>
> >> >> I think that Dave B's suggestion to delete the _FillValue attribute in
> >> >> NCL memory will mitigate the problem, but I would still wonder how and
> >> >> why _FillValue was set to zero. Marston, do you have any software,
> >> >> other than NCL, to display the file's metadata and check for the
> >> >> expected missing value a different way? I am not experienced with
> >> >> HDF4, so I can only raise suspicions, FWIW.
> >> >>
> >> >> --Dave A.
> >> >>
> >> >> On Mon, Jul 29, 2013 at 12:00 PM, David Brown <dbrown@ucar.edu> wrote:
> >> >> > Hi Marston,
> >> >> >
> >> >> > You cannot test a value that is set to the _FillValue for equality using
> >> >> > e.g.: Data_quality .eq. 0. Instead you should use the 'ismissing' function,
> >> >> > for example:
> >> >> >
> >> >> > where (ismissing(Data_quality), test_var = test_var@_FillValue,
> >> >> > test_var)
> >> >> >
> >> >> > (assumes test_var is another variable in the file that has the same
> >> >> > dimensionality as Data_quality)
> >> >> >
> >> >> > The other option in this case might be to delete the _FillValue
> >> >> > attribute:
> >> >> >
> >> >> > delete (Data_quality@_FillValue)
> >> >> > where (Data_quality .eq. 0, test_var = test_var@_FillValue, test_var)
> >> >> >
> >> >> > This is in no way specific to the byte type.
> >> >> >
> >> >> > It is a feature of ncdump that values set to the _FillValue are printed
> >> >> > as '_'. Again this is not specific to any type. If you don't like this you
> >> >> > could use ncl_filedump instead.
> >> >> > -dave
> >> >> >
> >> >> >
> >> >> >
> >> >> > On Jul 29, 2013, at 11:09 AM, Marston Johnston <shejo284@gmail.com>
> >> >> > wrote:
> >> >> >
> >> >> >>
> >> >> >> Hi,
> >> >> >>
> >> >> >> I'm working with CloudSat hdf4-eos files using NCL version 6.1.
> >> >> >> In the CloudSat file, the Data_Quality variable in the hdf file is of
> >> >> >> type "byte".
> >> >> >> Is there a function to check which byte is set? For example, how do I
> >> >> >> check other variable arrays using mask or where, with the condition that
> >> >> >> Data_quality is 0? I tried the function byte2flt, but this fails. The
> >> >> >> problem is that in the CloudSat docs, the Data_quality flag is set to 0 if
> >> >> >> the data is good. Otherwise the data is bad. After NCL reads the variable,
> >> >> >> the _FillValue is set to 0.
> >> >> >>
> >> >> >> Variable: Data_quality
> >> >> >> Type: byte
> >> >> >> Total Size: 20820 bytes
> >> >> >> 20820 values
> >> >> >> Number of Dimensions: 1
> >> >> >> Dimensions and sizes: [nray_2C_ICE | 20820]
> >> >> >> Coordinates:
> >> >> >> Number Of Attributes: 3
> >> >> >> _FillValue : 0
> >> >> >> unsigned : True
> >> >> >> hdfeos_name : Data_quality
> >> >> >>
> >> >> >> Another strange thing is that when the hdd file is converted to netcdf,
> >> >> >> the ncdump shows much of, if not all of the data as containing: '-', not
> >> >> >> values.
> >> >> >>
> >> >> >> Appreciate little info on how to treat byte arrays.
> >> >> >>
> >> >> >> /M
> >> >
> >> >
> >> >
> >> >
> >> > --
> >> > Only the fruitful thing is true!
> >>
> >>
> >>
> >> --
> >> Only the fruitful thing is true!
> >> _______________________________________________
> >> ncl-talk mailing list
> >> List instructions, subscriber options, unsubscribe:
> >> http://mailman.ucar.edu/mailman/listinfo/ncl-talk
> >
> >
> >
> >
> > --
> > Only the fruitful thing is true!
> >
> >
> >
> > --
> > Only the fruitful thing is true!
> > _______________________________________________
> > ncl-talk mailing list
> > List instructions, subscriber options, unsubscribe:
> > http://mailman.ucar.edu/mailman/listinfo/ncl-talk
>
>
>
>
> --
> Only the fruitful thing is true!

_______________________________________________
ncl-talk mailing list
List instructions, subscriber options, unsubscribe:
http://mailman.ucar.edu/mailman/listinfo/ncl-talk
Received on Tue Jul 30 15:03:09 2013

This archive was generated by hypermail 2.1.8 : Thu Aug 01 2013 - 15:55:04 MDT