Example pages containing:
tips |
resources |
functions/procedures
NCL: Plotting and working with shapefiles
Shapefiles
Shapefiles are a
From Wikipedia:
The Esri* Shapefile or simply
a shapefile is a popular geospatial vector data format for
geographic information systems software. It is developed and regulated
by Esri as a (mostly) open specification for data interoperability
among ESRI and other software products.[1] A "shapefile" commonly
refers to a collection of files with ".shp", ".shx", ".prj", ".dbf",
and other extensions on a common prefix name (e.g., "lakes.*"). The
actual shapefile relates specifically to files with the ".shp"
extension, however this file alone is incomplete for distribution, as
these other supporting files are required.
*Esri and the Esri Logo are licensed trademarks of
Environmental Systems Research Institute, Inc.
Shapefiles describe a homogeneous set of geometrical features comprised of either points, polylines, or polygons. These, for example, could represent water wells, rivers, and lakes, respectively. Each feature may also have non-spatial attributes that describe it, such as the name or temperature.
The Global Administrative Database
(GADM) offers consistent administrative boundaries at multiple levels.
The level 0 database (nations) is good to use for global or mesoscale
results, level 1 is the first level of sub-national administration
(typically states/provinces and territories) while level 2 offers the
second level of administration and is potentially useful for
high-resolution plots. Sometimes there are more levels (for example,
France has 6) with even higher resolution. The global shapefiles are
large, but it's possible to download individual countries separately.
NOAA provides some useful AWIPS
shapefile data.
Shapefile plotting and masking routines
These functions can be used to draw polygons, lines, and markers from shapefile data:
The shapefile_utils.ncl script provides additional useful routines for working with shapefiles:
- print_shapefile_info - prints information about the shapefile
- plot_shapefile - creates a basic map plot with the shapefile outlines added
- shapefile_mask_data - masks your own data against outlines in a shapefile
Important note: The shapefile_mask_data function is a powerful function used by many scripts on this page. In order for your data to be masked correctly using this function, your longitude values must be in the same range as the longitude values on the shapefile. Most shapefile longitude values are in the range -180 to 180. You can use the print_shapefile_info procedure above to print this information.
For a sample of the first two routines, try this simple 4-line NCL script with your own shapefile data, or download sample shapefiles here.
load "./shapefile_utils.ncl" sname = "states.shp" print_shapefile_info(sname) plot_shapefile(sname)
Shapefile details
What follows are details of the contents of a shapefile. Several functions (see above) have been added to hide some of this detail.
Within NCL, a shapefile appears as a collection of 4 or 5 specifically named variables that encode the geometry of the features, along with some number of non-spatial variables. The number, names and types of the non-spatial variables depend upon the specific shapefile. The geometry of a feature is composed of one or more segments. Segments in turn are composed of an ordered-list of X, Y (and optionally Z) tuples. NCL uses the following variables to encode these relationships:
geometry(num_features, 2)
segments(num_segments, 2)
x(num_points)
y(num_points)
z(num_points) ; 3D datasets only
Each feature in the shapefile is represented by an entry in the geometry
variable, along with corresponding entries in the non-spatial variables; i.e., the data for the i-th feature is found at the i-th entry of these variables. Each entry in geometry
has two values: geometry(i, 0)
specifies the index into variable segments
of the 1st segment of the i-th feature, and geometry(i, 1)
denotes the number of segments that make up the feature.
Similarly, each entry in segments
has two values: segments(j, 0)
is the index of the 1st XY(Z) tuple of the j-th segment, and segments(j, 1)
is the number of tuples belonging to the segment.
With this encoding, any subsequent segments belonging to the i-th feature follow its first one in segments
, and all the XY(Z) tuples belonging to the j-th segment follow the first in the x,y,(z)
variables.
NCL defines several global attributes for a shapefile:
geom_segIndex = 0
geom_numSegs = 1
segs_xyzIndex = 0
segs_numPnts = 1
geometry_type = "point" | "polyline" | "polygon"
layer_name = ; value derived from the shapefile
The first 4 attributes are intended as symbolic indices into the geometry
and segments
variable; see the examples below for how they should be used.
To run this example, you must have the files "states.shp", "states.dbf", and "states.shx". These can be obtained from the National Map.
A Python version of this projection is available here.
To run this example, you must have the "states" shapefiles from the previous example, along with "tornadx020.shp", "tornadx020.dbf", and "tornadx020.shx". These can be obtained from the USGS's holdings on data.gov.
You cannot use the "HighRes" database with a global map, so the shapefile outlines are your best bet if you need more resolution.
To run this example, you must download the "HYDRO1k Streams data set
for South America as a tar file in shapefile format" (2.4 MB) from:
http://eros.usgs.gov/#/Find_Data/Products_and_Data_Available/gtopo30/hydro/samerica
Note: this dataset no longer appears to be available as of February 2014.
See the description of this function for important information.
This particular example reads a shapefile to get an outline of the Mississippi River Basin. You then have the option of masking out all areas inside or outside this outline.
The "mrb.xxx" data files for this example can be found on the example datasets page.
The shapefiles_4_old.ncl is an older script that doesn't use a separate function to mask the data. Look at this script if you are interested in seeing the "guts" of what it takes to mask data against a shapefile outline.
Also provides an example of using table to create a custom map legend.
The shapefiles for this example were obtained from DIVA-GIS. Search for administrative boundaries of Pakistan and download. The resultant zipfile contains four sets of shapefile files.
The point is to show that shapefiles tend to have similar formats, and hence you can take a script and easily modify it to draw the outlines you're interested in.
In this example, the outlines are drawn with polylines, and the places of interest with text strings and polymarkers.
With versions 6.1.2 or earlier of NCL, if you try to plot a lot of individual line segments or text strings using gsn_add_polyxxxx functions, then you may want to consider using gsn_polyline instead of gsn_add_polyline, gsn_polymarker instead of gsn_add_polymarker, and gsn_text instead of gsn_add_text.
With version 6.2.0 of NCL, drawing polylines and polygons is much faster. Markers and text may still be slow.
The routines gsn_add_shapefile_polygons, gsn_add_shapefile_polylines, and gsn_add_shapefile_polymarkers are used to attach the shapefile information.
In this case, the shapefile contains coastal outlines, which a land mask is created from. See the function "create_mask_from_shapefile" in the "mask_12.ncl" script. This function only works for data that contains coordinate arrays. You will need to modify it to work with curvilinear or unstructured data.
You should be able to use any shapefile that contains polygon data (point and polyline data won't work) to create the desired mask.
The shapefile used in this example was part of a compressed file,
"GSHHS_shp_2.2.0.zip", downloaded from:
http://www.ngdc.noaa.gov/mgg/shorelines/data/gshhs/oldversions/version2.2.0
You need to uncompress it with the "unzip" command. You can use any of the other shapefiles that are included with this file, but they are potentially a higher resolution, and hence creating the mask will take longer.
The shapefile data came with a suggested lithologic color map: http://mrdata.usgs.gov/catalog/lithclass-color.php, which shows which colors to use for which rock type. This example draws the lithologic color map on a separate frame, using several labelbars.
In order to use the suggested color map, the gsn_add_shapefile_polygons procedure was copied from the $NCARG_ROOT/lib/ncarg/nclscripts/csm directory, and then modified as needed.
The New Mexico image additionally adds fault lines.
This script calls shapefile_mask_data, which is in the shapefile_utils.ncl script. See the description of this function for important information.
Two USA shapefiles were tried with this example. The "gz_2010_us_020_00_5m.shp" shapefile is from http://www.census.gov/geo/www/cob/ and the "nationalp010g.shp" shapefile is from http://nationalmap.gov/small_scale/mld/1natlp.html. The census one is smaller and hence the script runs much faster on this one (200+ wall seconds versus 17 wall clock seconds). Which file you use depends on how fine your original grid is, and how good of a mask you need.
See references to "feature_names", "vname", and "vlist" in the "gsn_add_shapefile_polylines_subset" function at the top of the NCL script.
The shapefile contains Interstate Highways, and only the primary interstate highways (I-5, I-82, and I-90) in Western United States are drawn. (Special thanks to Dave Allured for his improvement of this script to correctly plot all highway segments with complex entries, like "I- 5, US 30".)
The shapefile was downloaded from http://www.nws.noaa.gov/geodata/catalog/transportation/html/interst.htm.
The gc_inout function is used to collect lat/lon pairs that fall in each county. Since this function can be slow if you are checking a lot of points, a number of "if" statements are included to prevent unnecessary calls to this function.
The county outlines for Georgia come from the USA_adm2.shp shapefile, which was downloaded from http://www.gadm.org/country.
In order to attach the shapefile outlines to the plot, you need to set "pltres@PanelPlot = True" so that wrf_map_overlays doesn't remove the contour plot. You can then use gsn_add_shapefile_polylines to add the shapefile outlines.
The country outline for USA was downloaded from http://www.gadm.org/country.
This script calls shapefile_mask_data, which is in the shapefile_utils.ncl script. See the description of this function for important information.
The outlines for the U.S. states comes from the USA_adm2.shp shapefile, which was downloaded from http://www.gadm.org/country.
The first frame is the unzoomed variable, and the second frame shows how to zoom in on the area of interest using the special "ZoomIn" WRF resource.
You can customize what outlines are drawn by using command line options. Examples:
- Coastal region
ncl 'subregion="6.5,14.75,50.,55.5"' Germany_coastal_counties_DEU_adm.ncl
- Draw "Schleswig-Holstein" (default) and plot only the sub-region
ncl 'subregion="7.8,11.9,53.0,55.3"' Germany_coastal_counties_DEU_adm.ncl
- Draw "Schleswig-Holstein" (default) but don't draw the border of all states
ncl 'states_border=False' Germany_coastal_counties_DEU_adm.ncl
- Select the state "Hessen" but don't draw the borderline of Germany
ncl 'state_name="Hessen"' 'country_border=False' Germany_coastal_counties_DEU_adm.ncl
If you run the France_1.ncl script with the France administrative shapefiles downloaded from gadm.org/country, you will see significant speed-up mostly for the "FRA_adm5.shp" file, which has 36,612 features to draw:
-------------------------- NCL V6.1.2 timing results: -------------------------- . . . SNIP . . . ---> Drawing FRA_adm/FRA_adm5.shp... 207.032 CPU seconds -------------------------- NCL V6.2.0 timing results: -------------------------- . . . SNIP . . . ---> Drawing FRA_adm/FRA_adm5.shp... 2.87449 CPU seconds
Note: this scrip actually generates six images. The timing results and the image displayed above are from running this script with the "do" loop changed to "do i=5,5".
The France_2.ncl script produces identical plots, only it uses gsn_add_polyline instead of gsn_add_shapefile_polylines. This script will only work with NCL V6.2.0, because it references the gsSegments resource directly.
The France_3.ncl script draws the "FRA_adm1.shp" shapefile as filled polygons, using gsColors to specify a fill color for each segment.
The left image is from running NCL V6.3.0 on this script. The right image is from running NCL V6.4.0.
In NCL versions 6.3.0 and earlier, the NCL outlines will not show the county of Broomfield or the updated counties around Denver. In NCL version 6.4.0, the NCL map databases were updated to reflect the current Colorado counties.
The gsn_coordinates procedure is used to draw the lat/lon grid as a set of markers.
See example 21 below for an example that compares masking data using shapefile_mask_data, and masking data by drawing the undesired areas in white.
The shapefile_mask_data function is in the shapefile_utils.ncl script. See the description of this function for important information.
This is useful if you want to keep some of your original data points that fall just outside a shapefile outline.
This script requires shapefile_mask_data_mod.ncl.
This example uses the shapefile_mask_data function found shapefile_utils.ncl. See the description of this function for important information.
In order to mask data against outlines in a shapefile, you must provide the name of a variable on the shapefile that contains the list of names or values you want to mask against.
For example, here, the "USA_adm1.shp" shapefile is being used, which has a "NAME_1" string array containing feature names that are simply the U.S state names: "Alabama", "New Mexico", etc. You can reference one or more of these strings to indicate you want to mask or keep data in these areas.
The shapefile_mask_data function, in the shapefile_utils.ncl script, allows you to set the special "keep" option to False, which will cause data to be removed rather than retained.
The conform_dims function is used to conform the mask array to a 3D array, and then mask is used to apply the mask.
You can only do this kind of masking if your lat/lon grid remains the stationary across all time steps.
See the description of shapefile_mask_data for important information.
The first plot shows the original data with no masking. The purple lines represent the shapefile areas that we want to apply the masking to.
The second plot shows the result of using the shapefile_mask_data function (in shapefile_utils.ncl) to mask the data based on an array of given areas to mask. (You can use strings or numeric types for the areas you want to mask, but the array values should match whatever type the variable is on the shapefile.) You will see the blockiness near the shapefile outlines, due to the fact that NCL will not contour data that is bordered by missing values. The black dots represent the WRF data that was kept by this function, and the red dots represent the data that was set to missing.
The third plot shows how to mask the data by filling the undesired areas in white. This is a bit of a pain, because you have to make sure you fill in all the undesired areas, and set various "draw order" resources to make sure things are drawn in the correct order. This method is only useful when doing graphics, and not if you need to mask the data for calculation purposes later.
Note that the third plot shows some white areas inside the purple shapefile outlines. This is due to the fact that the script set mpOceanFillColor to "white", which causes some shapefile areas to be covered up by the white ocean areas.
If you want to keep the data inside the Caspian Sea, but not inside the isles, then you need to mask the data twice: first by setting the "keep" attribute to True to keep the values inside the single outline, and second by setting "keep" to False to throw away values in the isle areas:
opt = True opt@keep = True ; Keep values inside this shape data_mask_casp = shapefile_mask_data(data,caspian_shape_name,opt) opt@keep = False data_mask_isles = shapefile_mask_data(data_mask_casp,isles_shape_name,opt)Image #1: The original data before masking
Image #2: The data after masking against the Caspian Sea outline. Note that the little isles still have data in them.
Image #3: The data after masking the previously-masked data against the isle outlines. Note that the isles no longer have data in them.
Images #4-5: Zoomed in versions of images #2-3.
Creating a mask array from a shapefile can take a long time if you have a big shapefile or lots of points to check. Once you have the mask array, use it with the mask function to mask other variables of the same size. This goes MUCH faster than using shapefile_mask_data. You can also write this mask array to a NetCDF file for use by other scripts.
Optionally, this script writes the masked variables to a new NetCDF file, and/or plots the original data versus the masked data for comparison.
This script uses a Mississippi River Basin shapefile (mrb.xxx) for demonstrative purposes. You should use your own shapefile data, but you can download the mrb files if needed.
It does this by generating a series of quadrilaterals around every lat/lon point of valid data, and calling gc_qarea to get a sum of the area of all of them. This is only a rough approximation of the actual area of the data, which you can see by the pink squares in the third image.
For comparison purposes, the area (in km^2) of shapefile outline was also calculated, using area_poly_sphere.
The United States Census cartographic boundary shapefile, cb_2018_us_nation_5m.zip, was downloaded from: https://www.census.gov/geographies/mapping-files/2018/geo/carto-boundary-file.html