Re: is there a way to filter out non-matching strings from an array of strings using an 'ind' type operation?

From: Wei Huang <huangwei_at_nyahnyahspammersnyahnyah>
Date: Tue Mar 05 2013 - 20:37:41 MST

Try:

raw_filenames = systemfunc("cd " + raw_data_directory + " ; ls *.txt *.TXT *.ten *.TEN *.one *.ONE *.1sec *.10sec | grep -v FDlog")

Witch will filter out any file contains FDlog.

Wei

huangwei@ucar.edu
VETS/CISL
National Center for Atmospheric Research
P.O. Box 3000 (1850 Table Mesa Dr.)
Boulder, CO 80307-3000 USA
(303) 497-8924

On Mar 5, 2013, at 5:32 PM, Jonathan Vigh <jvigh@ucar.edu> wrote:

> Greetings NCLers,
>
> I'm trying to filter out strings from an array of strings based on one or more patterns, but I'm running into difficulties in accomplishing this in NCL.
>
> To start, I am creating a list of filenames that match a number of file extensions that normally correspond to certain data files:
> raw_filenames = systemfunc("cd " + raw_data_directory + " ; ls *.txt *.TXT *.ten *.TEN *.one *.ONE *.1sec *.10sec ")
>
> It turns out that not all of the files with these extensions are data files, however -- sometimes they are a log file (e.g. 20060829H1_FDlog.txt). I'd like to filter these out of the list of files, retaining just the files that are actual data files. So in effect, I'd like to set up one or more exclusion conditions to screen these out, similar to how one might use the 'ind' function for other types of logical tests.
>
> Using 'ind' to do this negative filter operation turns out to be unexpectedly difficult when strings are involved, since the only string matching function that returns a boolean (i.e., isStrSubset) can only accept one input string at a time. Luckily, NCL does now include a nice 'str_match_ind' function, but this doesn't seem to help me since this function only returns the indices of the positive matches. What I'm looking for is a way to do the inverse -- to get back all the strings (or their indices) that do NOT match a specific pattern. In effect, I need a 'str_no_match_ind' function.
>
> Does anyone know of a way to do this within NCL without using loops and/or complicated logic? I suppose the simplest hack would be to pipe the original list of files into a "grep -v 'badpattern' " when it is first created. I can imagine though that there might be other circumstances where a 'str_no_match_ind' function might be very useful -- for instance, when doing quality control on large arrays of strings.
>
> If there's no easy way around this difficulty, I'd like to request that a 'str_no_match_ind' be added to NCL, or alternatively, that the 'isStrSubset' function be modified to accept arrays of strings as input, so that it can be used in regular 'ind' operations.
>
> I apologize if I've overlooked something really simple in all this.
>
> Thanks!
> Jonathan
>
>
> --
> Jonathan Vigh
> Project Scientist I, Joint Numerical Testbed
> Research Applications Laboratory (RAL)
> National Center for Atmospheric Research (NCAR)
> P.O. Box 3000 tel: +1 303 497 8205
> Boulder, CO 80307-3000 fax: +1 (303) 497-8171
> http://www.ral.ucar.edu/staff/jvigh/
> http://www.ral.ucar.edu/hurricanes/
> _______________________________________________
> ncl-talk mailing list
> List instructions, subscriber options, unsubscribe:
> http://mailman.ucar.edu/mailman/listinfo/ncl-talk

_______________________________________________
ncl-talk mailing list
List instructions, subscriber options, unsubscribe:
http://mailman.ucar.edu/mailman/listinfo/ncl-talk
Received on Tue Mar 5 20:38:23 2013

This archive was generated by hypermail 2.1.8 : Thu Mar 07 2013 - 08:55:58 MST