is there a way to filter out non-matching strings from an array of strings using an 'ind' type operation?

From: Jonathan Vigh <jvigh_at_nyahnyahspammersnyahnyah>
Date: Tue Mar 05 2013 - 17:32:52 MST

Greetings NCLers,

I'm trying to /filter out/ strings from an array of strings based on one
or more patterns, but I'm running into difficulties in accomplishing
this in NCL.

To start, I am creating a list of filenames that match a number of file
extensions that normally correspond to certain data files:
    raw_filenames = systemfunc("cd " + raw_data_directory + " ; ls *.txt
*.TXT *.ten *.TEN *.one *.ONE *.1sec *.10sec ")

It turns out that not all of the files with these extensions are data
files, however -- sometimes they are a log file (e.g.
20060829H1_FDlog.txt). I'd like to filter these out of the list of
files, retaining just the files that are actual data files. So in
effect, I'd like to set up one or more exclusion conditions to screen
these out, similar to how one might use the 'ind' function for other
types of logical tests.

Using 'ind' to do this negative filter operation turns out to be
unexpectedly difficult when strings are involved, since the only string
matching function that returns a boolean (i.e., isStrSubset) can only
accept one input string at a time. Luckily, NCL does now include a nice
'str_match_ind' function, but this doesn't seem to help me since this
function only returns the indices of the positive matches. What I'm
looking for is a way to do the inverse -- to get back all the strings
(or their indices) that do NOT match a specific pattern. In effect, I
need a 'str_no_match_ind' function.

Does anyone know of a way to do this within NCL without using loops
and/or complicated logic? I suppose the simplest hack would be to pipe
the original list of files into a "grep -v 'badpattern' " when it is
first created. I can imagine though that there might be other
circumstances where a 'str_no_match_ind' function might be very useful
-- for instance, when doing quality control on large arrays of strings.

If there's no easy way around this difficulty, I'd like to request that
a 'str_no_match_ind' be added to NCL, or alternatively, that the
'isStrSubset' function be modified to accept arrays of strings as input,
so that it can be used in regular 'ind' operations.

I apologize if I've overlooked something really simple in all this.


Jonathan Vigh						
Project Scientist I, Joint Numerical Testbed	
Research Applications Laboratory (RAL)
National Center for Atmospheric Research (NCAR)
P.O. Box 3000            tel: +1 303 497 8205
Boulder, CO 80307-3000   fax: +1 (303) 497-8171

ncl-talk mailing list
List instructions, subscriber options, unsubscribe:
Received on Tue Mar 5 17:33:02 2013

This archive was generated by hypermail 2.1.8 : Thu Mar 07 2013 - 08:55:58 MST