Re: Concatenation of multiple files

From: David Brown <dbrown_at_nyahnyahspammersnyahnyah>
Date: Mon Mar 11 2013 - 19:13:28 MDT

OK, from this we can deduce that a year's worth of time steps would occupy

365 (days) * 8 (timesteps / day) * 360 (lat dim) * 720 (lon dim) * 4 (bytes per float)

comes to 3,027,456,000 bytes, which is more than 2GB. On a 32 bit system you should get a message from NCL that you are exceeding the maximum variable size. On a 64-bit system if you use more memory than the system allows (or is capable of) you get a seg fault because NCL doesn't know your system specs.

So five years worth for 1 variable is on the order of 15 GB.

I would suggest first trying to get a handle on how many files you can aggregate comfortably. Try 6 months worth to start. If that works and your goal is to create an aggregated time series in a new NetCDF file, you should probably plan for just one variable in the file. You can open the output file and pre-define its ultimate dimensions, and then fill its contents by reading one or more files at a time and outputting to a subsection of the variable in the output file in a loop.

You need to pre-define the dimensions, coordinate variables, and the variable in the output file. See the example at http://www.ncl.ucar.edu/Applications/method_2.shtml for an example of how to do this.
  -dave

On Mar 11, 2013, at 5:59 PM, Nate Mikle wrote:

> Hi David,
>
> Thank you for the responses. There are no messages prior to the seg fault. All of the files contain the same variables and dimensions, I am specifically looking at GPP. Eventually what I want to have 1 continuous file of daily observations of GPP from 2000-2004 for each 0.5 degree grid cell. Right now I have 60 separate files, each consists of a 3 hourly time-step. So, not only do I eventually want to concatenate the files, I also would like to aggregate the time-step to daily.
> Here are the results of ncl_filedump.
> Variable: f
> Type: file
> filename: BG1_CLM4VIC_v1_3hourly_2000-01
> path: BG1_CLM4VIC_v1_3hourly_2000-01.nc
> file global attributes:
> title : CLM4VIC v1.0 3-hourly output for MsTMIP simulation BG1 v1
> source : CLM4VIC v1.0
> model : CLM4VIC
> model_version : v1.0
> references : Li et al. (2011), Evaluating runoff simulations from the Community
> Land Model 4.0 using observations from flux towers and a mountainous watershed, JGR-at
> mos, 116, DOI:10.1029/2011JD016276
> contact : Maoyi Huang
> email : maoyi.huang@pnnl.gov
> experiment : BG1
> project : MsTMIP
> sim_version : v1
> comment : Global Baseline simulation (BG1), updated on Mon Oct 15 15:40:52 PDT 2
> 012,performed by Huimin Lei from Tsinghua University during his visit at PNNL
> Conventions : CF-1.4
> dimensions:
> lon = 720
> lat = 360
> nbnds = 2
> ncl3 = 248
> time = 248
> variables:
> float lon ( lon )
> long_name : Longitude
> description : longitude at center of each grid cell
> units : degrees_east
> bounds : lon_bnds
> float lat ( lat )
> long_name : Latitude
> description : latitude at center of each grid cell
> units : degrees_north
> bounds : lat_bnds
> float lon_bnds ( lon, nbnds )
> long_name : Longitude west-east bounds
> description : (west boundary of grid cell, east boundary of grid cell)
> units : degrees_east
> float lat_bnds ( lat, nbnds )
> long_name : Latitude south-north bounds
> description : (south boundary of grid cell, north boundary of grid cell)
> units : degrees_north
> double time ( ncl3 )
> _FillValue : 9.969209968386869e+36
> long_name : Time middle averaging period
> description : julian days days since 1700-01-01 00:00:00 UTC for middle time
> averaging period Proleptic_Gregorianc calendar
> units : days since 1700-01-01 00:00:00 UTC
> float GPP ( time, lat, lon )
> missing_value : -1e+34
> description : Rate of photosynthesis (always positive)
> units : kg C m-2 s-1
> long_name : Gross Primary Productivity
> _FillValue : -1e+34
> float NPP ( time, lat, lon )
> _FillValue : -1e+34
> long_name : Net Primary Productivity
> units : kg C m-2 s-1
> description : Net Primary Productivity (NPP=GPP-AutoResp, positive into plan
> ts)
> missing_value : -1e+34
> float TotalResp ( time, lat, lon )
> _FillValue : -1e+34
> long_name : Total Respiration
> units : kg C m-2 s-1
> description : Total respiration (TotalResp=AutoResp+heteroResp, always posit
> ive)
> missing_value : -1e+34
> float AutoResp ( time, lat, lon )
> _FillValue : -1e+34
> long_name : Autotrophic Respiration
> units : kg C m-2 s-1
> description : Autotrophic respiration rate (always positive)
> missing_value : -1e+34
> float HeteroResp ( time, lat, lon )
> _FillValue : -1e+34
> long_name : Heterotrophic Respiration
> units : kg C m-2 s-1
> description : Heterotrophic respiration rate (always positive)
> missing_value : -1e+34
> float Fire_flux ( time, lat, lon )
> _FillValue : -1e+34
> long_name : Fire emissions
> units : kg C m-2 s-1
> description : Flux of carbon due to fires (always positive)
> missing_value : -1e+34
> float NEE ( time, lat, lon )
> _FillValue : -1e+34
> long_name : Net Ecosystem Exchange
> units : kg C m-2 s-1
> description : Net Ecosystem Exchange (NEE=HeteroResp+AutoResp-GPP, positive
> into atmosphere)
> missing_value : -1e+34
> float Qh ( time, lat, lon )
> _FillValue : -1e+34
> long_name : Sensible heat
> units : W m-2
> description : Sensible heat flux into the boundary layer (positive into atmo
> sphere)
> missing_value : -1e+34
> float Qle ( time, lat, lon )
> missing_value : -1e+34
> description : Latent heat flux into the boundary layer (positive into atmosp
> here)
> units : W m-2
> long_name : Latent heat
> _FillValue : -1e+34
>
> On Mon, Mar 11, 2013 at 4:21 PM, David Brown <dbrown@ucar.edu> wrote:
> Hi Nate,
> Are there any error messages prior to the seg fault?
> What is more important than the file size is the variable size. It would be good if you can send us the output of
> ncl_filedump on one of these files. If the variable size (that is the variable "G" in your code) exceeds the amount of memory available on your system you are likely to have problems. If you have a 32-bit system there is a hard limit of 2 GB for a single variable. Otherwise, the variable size is just constrained by the amount of memory you have.
>
> Depending on the dimensionality of the variable "G" there may be different ways to break it in to subsections for processing in smaller chunks. But the best way to do that depends both on the dimensionality and what sort of processing you want to do on the data. So it is important to figure out (at least approximately) how much memory the variable would occupy if there were no memory restrictions. Then based on the dimensionality you can decide how to read it in chunks. For example, assuming the data has multiple levels and also assuming the size of a single level's worth of data would allow the variable to fit into the available memory space, you might figure out a way to process each level individually. In that case your code would look something this (assume the dimensions of the variable are time, lev, lat, lon -- lev, lat, and lon must have the same dimensions in each file):
>
> f = addfiles(pathi+fili+".nc","r")
> ListSetType(f,"cat")
>
> dim_names = getvardims(f)
> lev_ind = ind(dim_names .eq. "lev")
> dim_sizes = getfiledimsizes(f)
>
> agg_result = new((/ <size and dimensionality depends on what kind of processing you are doing> /), float_or_whatever_type)
>
> do i = 0, dim_sizes(lev_ind) - 1
> G = f[:]->GPP(:,i,:,:)
> ; process G and save to an aggregate result variable -- again the dimensionality of the result depends on the type of processing
> agg_result( i) = process(G)
> end do
>
> ; code for combining and further processing of the results if necessary
>
>
> ---
> Hope this helps.
> -dave
>
>
> On Mar 8, 2013, at 9:35 AM, Nate Mikle wrote:
>
> > Hello NCL users,
> >
> > I am new to both NCL and writing scripts in general, so I apologize in advance for simple mistakes and questions.
> >
> > I am trying to concatenate 12 different files that are temporally adjacent (monthly) into one file. I want to do this so I can then handle them all together in order to aggregate time (3 hourly) into a daily time step.
> >
> > If there are any ideas about how to do this or a better direction for me to go they would be greatly appreciated. Thanks in advance.
> >
> > -Nate Mikle
> >
> > Here is the script I have so far, it gives me a segmentation fault. Does this mean the files are too large (each is 2200 MiB), if so how do I make them smaller-delete unnecessary varaibles?
> >
> >
> > load "$NCARG_ROOT/lib/ncarg/nclscripts/csm/gsn_code.ncl"
> > load "$NCARG_ROOT/lib/ncarg/nclscripts/csm/gsn_csm.ncl"
> > load "$NCARG_ROOT/lib/ncarg/nclscripts/csm/contributed.ncl"
> > load "$NCARG_ROOT/lib/ncarg/nclscripts/csm/shea_util.ncl"
> >
> > begin
> >
> > pathi = "/home/mikl6340/MsTMIP_Model/CLM-VIC/2000/"
> > fili = (/"BG1_CLM4VIC_v1_3hourly_2000-01","BG1_CLM4VIC_v1_3hourly_2000-02","BG1_CLM4VIC_v1_3hourly_2000-03","BG1_CLM4VIC_v1_3hourly_2000-04","BG1_CLM4VIC_v1_3hourly_2000-05","BG1_CLM4VIC_v1_3hourly_2000-06","BG1_CLM4VIC_v1_3hourly_2000-07","BG1_CLM4VIC_v1_3hourly_2000-08","BG1_CLM4VIC_v1_3hourly_2000-09","BG1_CLM4VIC_v1_3hourly_2000-10","BG1_CLM4VIC_v1_3hourly_2000-11","BG1_CLM4VIC_v1_3hourly_2000-12"/)
> >
> > f = addfiles(pathi+fili+".nc","r")
> >
> > ListSetType(f,"cat")
> > G = f[:]->GPP
> >
> > printVarSummary(G)
> >
> > end
> > _______________________________________________
> > ncl-talk mailing list
> > List instructions, subscriber options, unsubscribe:
> > http://mailman.ucar.edu/mailman/listinfo/ncl-talk
>
>

_______________________________________________
ncl-talk mailing list
List instructions, subscriber options, unsubscribe:
http://mailman.ucar.edu/mailman/listinfo/ncl-talk
Received on Mon Mar 11 17:13:39 2013

This archive was generated by hypermail 2.1.8 : Wed Mar 13 2013 - 14:19:38 MDT