View Full Version : assistance with complex regex.

07-12-2009, 10:28 PM

I am trying to build a string comprising a set of words which are in another passage of text - highlighted by brackets.

so I have tried numerous ways and am getting nowhere.

Any tips you can provide will be very welcome.

$keywords = 'The (quick) brown fox (jumped) over the (lazy) dog';

#$keywords =~ s/\([^()]\)+/$1/;

if ($keywords =~ m/^\([^()]\)+$/){ print qq( s1=$1 $2 $3); }

$keywords =~ s/([(]?[-\w\ ]+[)]?){0,8}/$1/;

the special var (rather the result), should be like this:
$1 = 'quick jumped lazy';


07-12-2009, 11:01 PM
OK, this gets me the (quick) split across $1 $" $3 but, not the other bracketed words.

if ($keywords =~ /(\()([\w\ ]+)(\))+/) { print qq( s1=$1 s2=$2 s3=$3 s4=$4 ); }


Shannon Blonk
07-13-2009, 02:53 AM
Your first try was pretty close -- just needed your repetition in the right spot.

@words= 'The (quick) brown fox (jumped) over the (lazy) dog'=~/\(([^)]+)\)/g;


@words= 'The (quick) brown fox (jumped) over the (lazy) dog'=~/\((.*?+)\)/g;

07-13-2009, 03:41 AM
Thanks Shannon, I'll take a look at that closely.

In the meantime I had mushed this together which works but the regex is much tidier.

my $keywords = $description;
$keywords =~ s/'//g;
$description =~ s/\/\(//g;
$description =~ s/\)\///g;
my @search_terms;
my @array = split ( '/' , $keywords);
my $count=0;

foreach my $word (@array)

#print qq( word = $word <br /> );
if ($word =~ /^(\([-\w\'\ ]+\))$/ )

my ($keep,$discard) = split /\// , $word, 2;
$keep =~ s/\(//;
$keep =~ s/\)//;
#print qq(keep = $keep);
push (@search_terms, $keep);



Shannon Blonk
07-13-2009, 04:34 AM

From that code it looks like the keywords are delimited by /( and )/, not plain parentheses? You kill all the single quotes in $keywords then allow them in the $word match? Then split on a slash after a match that excludes slashes?

I'm all confusified now.

07-13-2009, 04:48 AM
o-o-h the stripped out bit is sloppy. :(

I'm going back to the regex idea so only the () will be used and not the /( and )/.

Thanks for your answer.


07-18-2009, 03:31 AM
confused myself now. again :(

this first one works technically except that it prevents me from using brackets in normal text.

my @words = $keywords =~ /\(([^)]+)\)/g;

If I change it to this, it does not work but does not error

my @words = $keywords =~ /\({[^}]+}\)/g;

what I really want is to be able to use a symbol which would not be used in english text. Something like |

my @words = $keywords =~ /\(|[^|]+|\)/g;

that doesn't capture the words instead, it captures the whole paragraph.
And this captures the whole pargraph too.

my @words = $keywords =~ /\(|[^|]+\)/g;

so here is my breakdown of the regex.

/ = regex limiter
\( = regex boundary
| = first item to match on
[ = start of char class
^ = find the first occurrence of the following char
| = the following char to match on (mentioned in the above point)
] = end of char class
+ = 1 or more occurrences
\) = end regex boundary
/ = regex end limiter
g = make it all global
; = end of line

Any pointers most welcome.


Shannon Blonk
07-18-2009, 04:27 AM
my @words = $keywords =~ /\(|[^|]+\)/g;

would break down as

/ = regex delimiter
\( = open parenthesis (the backslash escapes the begin-capture meaning)
| = alternation (or)
[ = begin character class
^ = invert character class
| = literal pipe character
] = end character class
[^|] = anything but a pipe character
+ = repeat 1 or more times, greedily
\) = literal close paren.
/ = end regex

So, that would match an open paren OR a group of one or more non-pipe characters followed by a close paren.

To use you choice of keyword marker:
my ($beginKey,$endKey)=qw{ | | };
my $s = "The |quick| brown fox |jumped| over the |lazy| dog";
my @keywords= $s=~/$beginKey(.*?)$endKey/g;

# if you want to strip the keyword delimiter out of the original string
($beginKey,$endKey)=qw( -={ }=- );
$s = "The -={quick}=- brown fox -={jumped}=- over the -={lazy}=- dog";

07-18-2009, 06:00 AM
Blimey, I was way off.

Thanks for your response. I'll study it to try to mame sense of it.