Re: is there a way to filter out non-matching strings from an array of strings using an 'ind' type operation?

From: Jonathan Vigh <jvigh_at_nyahnyahspammersnyahnyah>
Date: Wed Mar 06 2013 - 13:07:21 MST

Thanks Wei -- I was already aware of that solution (see paragraph #5 in
my e-mail). My purpose in sending the e-mail was to check if there was a
more elegant way to do this within NCL, and if none existed, to
encourage the development of a function that would allow one to do
'ind'-type operations on the negation of a string match (i.e. return the
indices of all the non-matching strings in an array). Currently, there
doesn't seem to be any way to accomplish this in NCL, but it would be
quite useful for QC operations on large string arrays (for instance, to
return all the lines with good data, by doing a negative match against
one or more 'bad' QC patterns). If isStrSubset could accept an array of
strings and return a boolean True/False, then this would be possible,
but currently this isn't possible. For purposes of efficiency, it'd be
nice to have a way of doing this.

     Jonathan

On 03/06/2013 03:37 AM, Wei Huang wrote:
> Try:
>
> raw_filenames = systemfunc("cd " + raw_data_directory + " ; ls *.txt
> *.TXT *.ten *.TEN *.one *.ONE *.1sec *.10sec | grep -v FDlog")
>
> Witch will filter out any file contains FDlog.
>
> Wei
>
> huangwei@ucar.edu <mailto:huangwei@ucar.edu>
> VETS/CISL
> National Center for Atmospheric Research
> P.O. Box 3000 (1850 Table Mesa Dr.)
> Boulder, CO 80307-3000 USA
> (303) 497-8924
>
>
>
>
>
>
> On Mar 5, 2013, at 5:32 PM, Jonathan Vigh <jvigh@ucar.edu
> <mailto:jvigh@ucar.edu>> wrote:
>
>> Greetings NCLers,
>>
>> I'm trying to /filter out/ strings from an array of strings based on
>> one or more patterns, but I'm running into difficulties in
>> accomplishing this in NCL.
>>
>> To start, I am creating a list of filenames that match a number of
>> file extensions that normally correspond to certain data files:
>> raw_filenames = systemfunc("cd " + raw_data_directory + " ; ls
>> *.txt *.TXT *.ten *.TEN *.one *.ONE *.1sec *.10sec ")
>>
>> It turns out that not all of the files with these extensions are data
>> files, however -- sometimes they are a log file (e.g.
>> 20060829H1_FDlog.txt). I'd like to filter these out of the list of
>> files, retaining just the files that are actual data files. So in
>> effect, I'd like to set up one or more exclusion conditions to screen
>> these out, similar to how one might use the 'ind' function for other
>> types of logical tests.
>>
>> Using 'ind' to do this negative filter operation turns out to be
>> unexpectedly difficult when strings are involved, since the only
>> string matching function that returns a boolean (i.e., isStrSubset)
>> can only accept one input string at a time. Luckily, NCL does now
>> include a nice 'str_match_ind' function, but this doesn't seem to
>> help me since this function only returns the indices of the positive
>> matches. What I'm looking for is a way to do the inverse -- to get
>> back all the strings (or their indices) that do NOT match a specific
>> pattern. In effect, I need a 'str_no_match_ind' function.
>>
>> Does anyone know of a way to do this within NCL without using loops
>> and/or complicated logic? I suppose the simplest hack would be to
>> pipe the original list of files into a "grep -v 'badpattern' " when
>> it is first created. I can imagine though that there might be other
>> circumstances where a 'str_no_match_ind' function might be very
>> useful -- for instance, when doing quality control on large arrays of
>> strings.
>>
>> If there's no easy way around this difficulty, I'd like to request
>> that a 'str_no_match_ind' be added to NCL, or alternatively, that the
>> 'isStrSubset' function be modified to accept arrays of strings as
>> input, so that it can be used in regular 'ind' operations.
>>
>> I apologize if I've overlooked something really simple in all this.
>>
>> Thanks!
>> Jonathan
>>
>>
>> --
>> Jonathan Vigh
>> Project Scientist I, Joint Numerical Testbed
>> Research Applications Laboratory (RAL)
>> National Center for Atmospheric Research (NCAR)
>> P.O. Box 3000 tel: +1 303 497 8205
>> Boulder, CO 80307-3000 fax: +1 (303) 497-8171
>> http://www.ral.ucar.edu/staff/jvigh/
>> http://www.ral.ucar.edu/hurricanes/
>> _______________________________________________
>> ncl-talk mailing list
>> List instructions, subscriber options, unsubscribe:
>> http://mailman.ucar.edu/mailman/listinfo/ncl-talk
>

-- 
Jonathan Vigh						
Project Scientist I, Joint Numerical Testbed	
Research Applications Laboratory (RAL)
National Center for Atmospheric Research (NCAR)
P.O. Box 3000            tel: +1 303 497 8205
Boulder, CO 80307-3000   fax: +1 (303) 497-8171
http://www.ral.ucar.edu/staff/jvigh/
http://www.ral.ucar.edu/hurricanes/

_______________________________________________
ncl-talk mailing list
List instructions, subscriber options, unsubscribe:
http://mailman.ucar.edu/mailman/listinfo/ncl-talk
Received on Wed Mar 6 13:07:41 2013

This archive was generated by hypermail 2.1.8 : Thu Mar 07 2013 - 08:55:58 MST