Re: Byte array

From: Marston Johnston <shejo284_at_nyahnyahspammersnyahnyah>
Date: Tue Jul 30 2013 - 15:31:59 MDT

Hi everyone,

I truly appreciate all your investigations and explanations that made this
problem now clear. I've implemented Dave B.'s solution of deleting the
_FillValue form the Data_quality flag before using it and this seems to
work. I'm processing 2 years worth of 2C-ICE data so it will take some time
before I know if the results look clean or not. CloudSat is very noisy in
the Tropics but I least I'm now more assured of the data quality.

I hope this thread will be of use to current and future users of NCL and
the 2C-ICE/CloudSat dataset.

/M

On Tue, Jul 30, 2013 at 11:02 PM, David Brown <dbrown@ucar.edu> wrote:

> HI Phil,
> Thanks for the explanation and the reference from the hdfgroup site, and I
> apologize for suggesting possible confusion on your part. This does pretty
> much confirm what I was suspecting: that the concept of fill value has a
> different meaning in the HDFEOS CloudSat world than we are used to in our
> NCL and NetCDF-centric world. We do not usually make a big distinction
> between fill value and missing value. As with HDFEOS, the fill value is
> used as an initialization value for undefined data variables, and, just as
> with HDFEOS, there are ways to turn off the initialization step for
> performance reasons. But beyond serving as a initialization value, the
> _FillValue attribute value usually also serves as the missing value
> indicator.
> ncdump does not write the '_' character to the file (it does no file
> writing). As it dumps the contents of a variable, it checks each value for
> equality with the _FillValue, and, if equal, it outputs the '_' character
> in place of the value.
> In a somewhat analogous fashion, NCL has special handling for values equal
> to the _FillValue in arrays: during operations such as addition, a
> _FillValue in either operand short circuits the operation and propagates
> the _FillValue to that element of the result.
>
> In the case of the Data_quality flag on the other hand, it seems the value
> returned from SDgetfillvalue is intended simply as an initialization value
> rather than as an invalid value. In fact it turns out to be the value that
> says the data is valid, which kind of turns our normal approach on its
> head. This is good to know. I think our approach is ingrained enough in our
> community that it is not going to change. So we will need to document the
> differing approach used by your group and presumably other HDFEOS users.
>
> Fortunately, once understood, it is easy to compensate. NCL users who want
> to use the Data_quality flag or other variables with similar semantics can
> simply delete the _FillValue attribute, allowing the '0' value to be
> accessed as a normal value. At this point, I do not see a way for NCL to be
> able to solve this issue automatically, although if anyone has suggestions,
> we would be happy to consider them. It seems that it is just a case of two
> different models that do not quite mesh.
> -dave
>
>
> On Jul 30, 2013, at 1:01 PM, "Partain,Philip" <
> Philip.Partain@colostate.edu> wrote:
>
> Hi Marston,****
>
> The fill value is used to fill an array of data when reading a field
> that's been defined in an HDF file but not actually written to the file.
> This can be useful or necessary when data fields are written to files in
> multiple stages or in a non-contiguous manner. The fill value doesn't have
> to be a missing value when all of the fields are written to every file as
> they are with CloudSat products. In a completed file all data have been
> filled in so there shouldn't be a need to read or use the fill value. At
> least that was the convention eons ago when CloudSat decided to use HDF-EOS
> 2.5 (based on HDF 4.1). See
> http://www.hdfgroup.org/training/HDFtraining/UsersGuide/SDS_SD.fm9.html#13948.
> Perhaps newer specifications treat fill values differently?****
>
> It seems strange to me that ncdump would overwrite field values that have
> allocated and written to the file with "_" when it equals the fill value.
> If you are treating it as a missing value then usually you want that value
> to remain in the array. Especially when testing for missing values or so
> you don't have to read a string when you expect a number. Though I'm sure
> there are use cases and details I'm not familiar with.****
>
> The Data_quality field in 2C-ICE doesn't have a missing value. It is a
> bit flag that indicates why data elements (profiles) in other fields are
> missing. So, when filtering out missing CloudSat data don't use the fill
> value. First, skip profiles where Data_quality is not equal to zero. Or
> examine the Data_quality bits if you want to get specific. This indicates
> data that might be bad as a result of spacecraft maneuvers or abnormal
> radar operation. Then skip 2C-ICE fields that equal the documented missing
> values for those fields. For example, EXT_coef should be skipped if it is
> less than or equal to -7777. This indicates a failure by the 2C-ICE
> algorithm to retrieve the value. The missing values are documented at
> http://www.cloudsat.cira.colostate.edu/dataSpecs.php?prodid=112.****
>
> Hope this helps,****
> Phil****
>
> *From:* Marston Johnston [mailto:shejo284@gmail.com]
> *Sent:* Monday, July 29, 2013 7:08 PM
> *To:* Partain,Philip
> *Cc:* David Brown; ncl-talk@ucar.edu
> *Subject:* Re: [ncl-talk] Byte array****
> ** **
> Phil,****
> ** **
> Looks like something is not right with the HDF file or a misunderstanding
> of _FillValue. You can read the response from the NCL folks below. Perhaps
> you can have a look at one of the HDF files from 2C-ICE and respond to this
> email address, including the NCL group. I have little experience with byte
> arrays and I'm still new to NCL but even CDO and ncdump have problems with
> the byte array when the hdf file is converted and the Data_quality array is
> interpreted.****
> ** **
> /M****
> ** **
> ** **
> ---------- Forwarded message ----------
> From: *David Brown* <dbrown@ucar.edu>
> Date: Tue, Jul 30, 2013 at 1:14 AM
> Subject: Re: Byte array
> To: Marston Johnston <shejo284@gmail.com>
> Cc: ncl-talk@ucar.edu
>
>
> Hi Marston,
> After looking at the file, which I got from Dennis, and stepping through
> the file to see how the Data_quality _FillValue gets set, I do not see that
> NCL is doing anything wrong.
> NCL calls the HDFEOS routine SWgetfillvalue for each swath-type data
> variable. In the case of the Data_quality variable the routine returns
> success and the byte-type value 0 for the fill value.
> I have verified that if there is no fill value for a variable (examples
> are Clutter_reduction_flag_2B_GEOPROF and SurfaceHeightBin_2B_GEOPROF in
> this file) then the routine returns -1 meaning failure.
>
> NCL is not overwriting any values with the _FillValue or modifying the
> data in any way; it seems to me that the responder to your question is
> confused or at least he has a different idea of what "fill value" means. He
> seems to be suggesting that having the flag set to the "fill value" 0 means
> that the data is good, while any other value means there is a problem. That
> is pretty much the opposite of what I would think is the common view of the
> meaning of _FillValue as an invalid value. However, I can see how that
> interpretation might come about in this case where any non-zero value is
> indicating a problem with other data, not the Data_quality variable itself.
> Perhaps they are thinking of fill value as something like the default
> value, before any issues are raised for particular elements.
>
> Concerning the flag values: I think " 0: RayStatus_validity not normal. "
> means that when bit 0 of the flag value is set to 1, the quality issue is
> "RayStatus_validity not normal".
> In other words, at that point it is talking about bit 0 of the flag, not
> the value of the flag as a whole.
>
> You can use the function dim_gbits (see the documentation) to extract
> individual bits from this variable.
> -dave****
>
>
>
> On Jul 29, 2013, at 4:14 PM, Marston Johnston <shejo284@gmail.com> wrote:
>
> > Hi,
> >
> > The 2C-ICE folks replied that it seems as if NCL doesn't handle properly
> the byte array Data_quality when it reads it from the hdf file. Here is his
> response:
> >
> >> About the Data Quality question: does NCL overwrite the data quality
> flags with zero (the fill value) when converting hdf to netcdf? Even where
> the data quality flag is not zero? That would be a problem. When you have
> the original, correct values the easiest thing to do in any code that
> analyzes CloudSat data is to skip profiles that have a data quality value
> not equal to zero. If you want to check the bits then at least skip the
> profiles where bits 0, 5, and 6 are equal to one.
> > I'm not sure how to handle bit shift in NCL to check the individual
> bits. My original intention was to throw out all points with a flag value
> greater than 0.
> >
> > /M
> >
> > On Mon, Jul 29, 2013 at 10:08 PM, Marston Johnston <shejo284@gmail.com>
> wrote:
> > Hi Dave B.,
> >
> > Yes Dave, that is how I interpreted it, but the flags then say zero is a
> bad value: 0: RayStatus_validity not normal.
> >
> > I sure can upload a copy of the hdf file, but I sent one to Dennis about
> a week ago. Can you check on the FTP to see if there is a hdf file with a
> similar file name to: 2008001002019_08922_CS_2C-ICE_GRANULE_P_R04_E02.hdf?
> If you guys have cleaned the ftp, I can upload one right away.
> >
> > /M
> >
> >
> > On Mon, Jul 29, 2013 at 9:59 PM, David Brown <dbrown@ucar.edu> wrote:
> > Hi Marston,
> > Judging from the pdf doc that you pointed to, it looks like there should
> no missing value for the Data_quality variable. We could check to see if,
> for some reason,
> > NCL is mistakenly adding a _FillValue attribute. Can you send us a
> sample file so we can take a look?
> > Thanks.
> > -dave
> >
> >
> >
> > On Jul 29, 2013, at 1:45 PM, Marston Johnston <shejo284@gmail.com>
> wrote:
> >
> >> Hi Dave,
> >>
> >> Sorry for forgetting to CC the others. I'll do that henceforth.
> >> Will get back on this issue.
> >>
> >> /M
> >>
> >>
> >> On Mon, Jul 29, 2013 at 9:41 PM, Dave Allured - NOAA Affiliate <
> dave.allured@noaa.gov> wrote:
> >> Hi Marston.
> >>
> >> This is a good reply, thanks for doing all this investigation. Please
> >> remember to CC the user list on all technical replies, unless there is
> >> some good reason not to. I will treat this one as private, but would
> >> prefer not to keep going this way.
> >>
> >> I am looking forward to hear what the 2C-ICE people have to say.
> >>
> >> --Dave
> >>
> >> On Mon, Jul 29, 2013 at 1:24 PM, Marston Johnston <shejo284@gmail.com>
> wrote:
> >> > Hi Dave A.,
> >> >
> >> > This is a good point. I'm using NCL 6.1 which cannot convert the HDF4
> files
> >> > to netcdf, therefore I read the hdf files twice, once with the "hdf"
> file
> >> > suffix and then again with "hdf.hdfeos" suffix. This enables NCL to
> get all
> >> > the variables from the HDF4 file. I'm working on a privately-owned
> linux
> >> > cluster so I only have access to the NCL version above. I got a hdf4
> >> > converter from the HDF group and converted the hdf4 file to netcdf.
> It is
> >> > this file that I did a netcdf dump on and got '-' for the _FillValue.
> When I
> >> > used CDO to read this netcdf, I get the following:
> >> >
> >> > 252 : 0000-00-00 00:00:00 0 37082 37082 :
> >> > nan : Data_quality
> >> >
> >> > this reads that all of Data_quality is set to nan.
> >> >
> >> > I have no software that can read the hdf4 files on the server.
> >> >
> >> > The Data_Quality doc reads a bit criptic:
> >> >
> >> > Name in file: Data_quality Source: 2B-GEOPROF 012
> >> > Field type (in file): UINT(1) Field type (in algorithm): INT(2)
> Dimensions:
> >> > nray
> >> >
> >> > Units: --
> >> >
> >> > Range: 0 to 127 Missing value:
> >> > Missing value operator: Factor: 1
> >> >
> >> > Offset: 0 MB: 0.035
> >> >
> >> > Flags indicating data quality. If 0, then data is of good quality.
> >> > Otherwise, treat as a bit field with 8 flags:
> >> >
> >> > 0: RayStatus_validity not normal.
> >> > 1: GPS data not valid. 2: Temperatures not valid. 3: Radar telemetry
> data
> >> > quality is not normal. 4: Peak power is not normal. 5: CPR calibration
> >> > maneuver. 6: Missing frame. 7: Not used.
> >> >
> >> > You can find this info in the DOC:
> >> >
> http://www.cloudsat.cira.colostate.edu/ICD/2C-ICE/2C-ICE_PDICD.P_R04.20111014.pdf
> >> >
> >> > I don't have any tools to do view hdfeos files. I'm in contact with
> the
> >> > 2C-ICE people on how to handle the Data_Quality. Waiting to hear back
> from
> >> > them.
> >> >
> >> >
> >> > /M
> >> >
> >> >
> >> > On Mon, Jul 29, 2013 at 8:42 PM, Dave Allured - NOAA Affiliate
> >> > <dave.allured@noaa.gov> wrote:
> >> >>
> >> >> Marston and Dave B,
> >> >>
> >> >> Marston originally said, "... in the CloudSat docs, the Data_quality
> >> >> flag is set to 0 if the data is good." If the _FillValue attribute
> is
> >> >> present and zero after NCL reads this hdf4-eos file, this is totally
> >> >> inconsistent with this documentation, and there is an error
> somewhere.
> >> >>
> >> >> Possibilities include a special case problem in the NCL driver, the
> >> >> hdf4-eos file was modified or produced under different rules, damaged
> >> >> file, mistake in the documentation, subtle problem in the user NCL
> >> >> script, etc.
> >> >>
> >> >> I think that Dave B's suggestion to delete the _FillValue attribute
> in
> >> >> NCL memory will mitigate the problem, but I would still wonder how
> and
> >> >> why _FillValue was set to zero. Marston, do you have any software,
> >> >> other than NCL, to display the file's metadata and check for the
> >> >> expected missing value a different way? I am not experienced with
> >> >> HDF4, so I can only raise suspicions, FWIW.
> >> >>
> >> >> --Dave A.
> >> >>
> >> >> On Mon, Jul 29, 2013 at 12:00 PM, David Brown <dbrown@ucar.edu>
> wrote:
> >> >> > Hi Marston,
> >> >> >
> >> >> > You cannot test a value that is set to the _FillValue for equality
> using
> >> >> > e.g.: Data_quality .eq. 0. Instead you should use the 'ismissing'
> function,
> >> >> > for example:
> >> >> >
> >> >> > where (ismissing(Data_quality), test_var = test_var@_FillValue,
> >> >> > test_var)
> >> >> >
> >> >> > (assumes test_var is another variable in the file that has the same
> >> >> > dimensionality as Data_quality)
> >> >> >
> >> >> > The other option in this case might be to delete the _FillValue
> >> >> > attribute:
> >> >> >
> >> >> > delete (Data_quality@_FillValue)
> >> >> > where (Data_quality .eq. 0, test_var = test_var@_FillValue,
> test_var)
> >> >> >
> >> >> > This is in no way specific to the byte type.
> >> >> >
> >> >> > It is a feature of ncdump that values set to the _FillValue are
> printed
> >> >> > as '_'. Again this is not specific to any type. If you don't like
> this you
> >> >> > could use ncl_filedump instead.
> >> >> > -dave
> >> >> >
> >> >> >
> >> >> >
> >> >> > On Jul 29, 2013, at 11:09 AM, Marston Johnston <shejo284@gmail.com
> >
> >> >> > wrote:
> >> >> >
> >> >> >>
> >> >> >> Hi,
> >> >> >>
> >> >> >> I'm working with CloudSat hdf4-eos files using NCL version 6.1.
> >> >> >> In the CloudSat file, the Data_Quality variable in the hdf file
> is of
> >> >> >> type "byte".
> >> >> >> Is there a function to check which byte is set? For example, how
> do I
> >> >> >> check other variable arrays using mask or where, with the
> condition that
> >> >> >> Data_quality is 0? I tried the function byte2flt, but this fails.
> The
> >> >> >> problem is that in the CloudSat docs, the Data_quality flag is
> set to 0 if
> >> >> >> the data is good. Otherwise the data is bad. After NCL reads the
> variable,
> >> >> >> the _FillValue is set to 0.
> >> >> >>
> >> >> >> Variable: Data_quality
> >> >> >> Type: byte
> >> >> >> Total Size: 20820 bytes
> >> >> >> 20820 values
> >> >> >> Number of Dimensions: 1
> >> >> >> Dimensions and sizes: [nray_2C_ICE | 20820]
> >> >> >> Coordinates:
> >> >> >> Number Of Attributes: 3
> >> >> >> _FillValue : 0
> >> >> >> unsigned : True
> >> >> >> hdfeos_name : Data_quality
> >> >> >>
> >> >> >> Another strange thing is that when the hdd file is converted to
> netcdf,
> >> >> >> the ncdump shows much of, if not all of the data as containing:
> '-', not
> >> >> >> values.
> >> >> >>
> >> >> >> Appreciate little info on how to treat byte arrays.
> >> >> >>
> >> >> >> /M
> >> >
> >> >
> >> >
> >> >
> >> > --
> >> > Only the fruitful thing is true!
> >>
> >>
> >>
> >> --
> >> Only the fruitful thing is true!
> >> _______________________________________________
> >> ncl-talk mailing list
> >> List instructions, subscriber options, unsubscribe:
> >> http://mailman.ucar.edu/mailman/listinfo/ncl-talk
> >
> >
> >
> >
> > --
> > Only the fruitful thing is true!
> >
> >
> >
> > --
> > Only the fruitful thing is true!
> > _______________________________________________
> > ncl-talk mailing list
> > List instructions, subscriber options, unsubscribe:
> > http://mailman.ucar.edu/mailman/listinfo/ncl-talk****
>
>
> ****
> ** **
> --
> Only the fruitful thing is true!
>
>
>

-- 
Only the fruitful thing is true!

_______________________________________________
ncl-talk mailing list
List instructions, subscriber options, unsubscribe:
http://mailman.ucar.edu/mailman/listinfo/ncl-talk
Received on Tue Jul 30 15:32:18 2013

This archive was generated by hypermail 2.1.8 : Thu Aug 01 2013 - 15:55:04 MDT