Re: How to check and fill the missing lines with the missing values?

From: Wen.J.Qu <wen.j.qu_at_nyahnyahspammersnyahnyah>
Date: Fri Aug 24 2012 - 10:12:12 MDT

Hi, Mary,

Sorry that I may not make clear the question. What I mean is that the time series is not continous, there are some days without records. So I want to find these days and add missing values for the variables on these days.

Attached is an example of the file, which is data for one month in January. The file shoud have 31 lines (days) of records, but due to the days without records, we just get 12 days (lines). So could you please give me some suggetion about how to find these missing days and insert lines for these days with missing values?

Thanks a lot.

Shawn




Wen.J.Qu
2012-08-24



发件人: Mary Haley
发送时间: 2012-08-24 10:14:40
收件人: Wen.J.Qu@gmail.com
抄送:
主题: Re: [ncl-talk] How to check and fill the missing lines with the missing values?

Hi,


This is offline for a bit.


The example you showed below doesn't have any omitted fields, so I'm not sure what you mean by looking for missing values.


Or, do you mean that every field has a value, but sometimes the value is 99999 or 9999.9, which indicates a missing value?


Can you send me the whole file, if it's not too big?


--Mary


On Aug 24, 2012, at 8:57 AM, Wen.J.Qu wrote:


Hi, Mary

Thanks a lot for your help. Yes, I am reading a lot of ascii files. Below is an example of the file, there is no "..." in the file, and the missing days are just omitted, without blank lines.

I want to find these missing days and fill the lines with missing values for the varibles. Could you please give me some suggetions about this? Thanks a lot.

STN--- WBAN YEARMODA TEMP DEWP SLP STP VISIB WDSP MXSPD GUST MAX MIN PRCP SNDP FRSHTT
106160 99999 19730101 27.1 24 13.4 24 9999.9 0 965.9 4 6.2 24 2.7 24 6.0 999.9 35.6* 19.4* 0.00I 999.9 000000
106160 99999 19730102 27.9 24 16.4 24 9999.9 0 9999.9 0 6.0 24 2.2 24 8.9 999.9 35.6* 23.0* 0.00I 999.9 000000
106160 99999 19730107 30.4 24 28.4 24 9999.9 0 974.4 4 2.4 24 4.5 24 8.0 999.9 37.4* 26.6* 99.99 999.9 110000
106160 99999 19730108 28.5 24 27.4 24 9999.9 0 973.6 7 1.9 24 4.8 24 8.0 999.9 30.2* 28.4* 99.99 999.9 110000
106160 99999 19730109 30.4 24 29.4 24 9999.9 0 971.1 7 0.4 24 1.9 24 6.0 999.9 32.0* 28.4* 99.99 999.9 111000
106160 99999 19730117 29.7 24 27.9 24 9999.9 0 946.9 7 1.8 24 6.7 24 13.0 999.9 30.2* 28.4* 0.00I 999.9 100000
106160 99999 19730118 28.4 24 26.4 24 9999.9 0 9999.9 0 2.5 24 3.7 24 7.0 13.0 9999.9 9999.9 99.99 999.9 111000
106160 99999 19730119 27.7 24 25.3 24 9999.9 0 9999.9 0 1.4 24 1.9 24 3.9 999.9 28.4* 26.6* 99.99 999.9 111000
106160 99999 19730120 30.9 24 29.4 24 9999.9 0 947.4 7 2.1 24 6.0 24 12.0 999.9 35.6* 28.4* 99.99 999.9 111000
106160 99999 19730201 31.0 24 27.8 24 9999.9 0 9999.9 0 5.6 24 1.8 24 4.1 999.9 35.6* 28.4* 0.00I 999.9 100000
106160 99999 19730202 30.9 24 28.9 24 9999.9 0 967.3 6 2.6 24 3.6 24 8.0 999.9 32.0* 28.4* 0.00I 999.9 100000


Following is the scritpt I used to read the file.

;Read data into a big 1D string array
  fname = "data/gsod/1973/726055-99999-1973.op"
  data = asciiread(fname,-1,"string")

; Count the number of fields, just to show it can be done.
  nfields = str_fields_count(data(0)," ")
  print("number of fields = " + nfields)

;
; Skip first row of "data" because it's just a header line.
;
; Use a space (" ") as a delimiter in str_get_field. The first
; field is field=1 (unlike str_get_cols, in which the first column
; is column=0).
;
  stn = stringtoint(str_get_field(data(1::), 1," "))
  wban = stringtoint(str_get_field(data(1::), 2," "))

  yearmoda = stringtoint(str_get_field(data(1::), 3," "))
  year = stringtoint(str_get_cols(data(1::),14,17))
  month = stringtoint(str_get_cols(data(1::),18,19))
  day = stringtoint(str_get_cols(data(1::),20,21))

  temp = stringtofloat(str_get_field(data(1::), 4," "))
; Convert temperature from Fahrenheit to Celsius.
  temp = (temp-32)*5/9
  dewp = stringtofloat(str_get_field(data(1::), 6," "))
; Convert dew point temperature from Fahrenheit to Celsius.
  dewp = (dewp-32)*5/9

; Calculate relative humidity (%).
  rh = 100*(((112-0.1*temp+dewp)/(112+0.9*temp))^8)

  slp = stringtofloat(str_get_field(data(1::), 8," "))
  stp = stringtofloat(str_get_field(data(1::), 10," "))

  visib = stringtofloat(str_get_field(data(1::), 12," "))

  wdsp = stringtofloat(str_get_field(data(1::), 14," "))
  maxspd = stringtofloat(str_get_field(data(1::), 16," "))
  gust = stringtofloat(str_get_field(data(1::), 17," "))

  maxtemp = stringtofloat(str_get_field(data(1::), 18," "))
; Convert maximum temperature from Fahrenheit to Celsius.
  maxtemp = (maxtemp-32)*5/9
  mintemp = stringtofloat(str_get_field(data(1::), 19," "))
; Convert minimum temperature from Fahrenheit to Celsius.
  mintemp = (mintemp-32)*5/9

  prcp = stringtofloat(str_get_field(data(1::), 20," "))
  sndp = stringtofloat(str_get_field(data(1::), 21," "))

  frshtt = stringtoint(str_get_field(data(1::), 22," "))
  fog = stringtoint(str_get_cols(data(1::),132,132))
  rain = stringtoint(str_get_cols(data(1::),133,133))
  snow = stringtoint(str_get_cols(data(1::),134,134))
  hail = stringtoint(str_get_cols(data(1::),135,135))
  thunder = stringtoint(str_get_cols(data(1::),136,136))
  tornado = stringtoint(str_get_cols(data(1::),137,137))

  print(rh)
  print(maxtemp)
  print(frshtt)
  print(yearmoda)
  print(tornado)






Wen.J.Qu
2012-08-24



发件人: Mary Haley
发送时间: 2012-08-23 14:04:55
收件人: Wen.J.Qu@gmail.com
抄送:
主题: Re: [ncl-talk] How to check and fill the missing lines with the missing values?

Shawn,

I need more information. Are these lines of data in an ascii file?

If so, does the file actually look like that, with the "…" characters, or are the lines just blank, or something else?

If these lines of data are in an ascii file, how are you reading in the file?

If the lines are just blank, then you can read the file in as strings check for blank strings using "str_is_blank".

http://www.ncl.ucar.edu/Document/Functions/Built-in/str_is_blank.shtml

--Mary

On Aug 23, 2012, at 11:09 AM, Wen.J.Qu wrote:

> Hello,
>
> I am dealing with a daily time series of multiVaribles. My problem is that there are some missing days (lines) of the data, like below
>
> 19980101 ...
> 19980102 ...
> 19980103 ...
> 19980109 ...
> 19980201 ...
> 19980218 ...
> ... ...
>
> How can I check and fill these missing lines (days) with the missing values?
>
> Thanks a lot.
>
>
> Shawn
>
> Wen.J.Qu
> 2012-08-23
> _______________________________________________
> ncl-talk mailing list
> List instructions, subscriber options, unsubscribe:
> http://mailman.ucar.edu/mailman/listinfo/ncl-talk

Received on Fri Aug 24 10:12:22 2012

This archive was generated by hypermail 2.1.8 : Tue Aug 28 2012 - 08:53:45 MDT