Hello and welcome to our community! Is this your first visit?
Register
Enjoy an ad free experience by logging in. Not a member yet? Register.
Results 1 to 9 of 9
  1. #1
    Master Coder
    Join Date
    Apr 2003
    Location
    in my house
    Posts
    5,211
    Thanks
    39
    Thanked 201 Times in 197 Posts

    assistance with complex regex.

    Hi,

    I am trying to build a string comprising a set of words which are in another passage of text - highlighted by brackets.

    so I have tried numerous ways and am getting nowhere.

    Any tips you can provide will be very welcome.

    Code:
    $keywords = 'The (quick) brown fox (jumped) over the (lazy) dog';
      
    #$keywords =~ s/\([^()]\)+/$1/;
    
      if ($keywords =~ m/^\([^()]\)+$/){ print qq( s1=$1 $2 $3); }
      
    $keywords =~ s/([(]?[-\w\ ]+[)]?){0,8}/$1/;
    the special var (rather the result), should be like this:
    $1 = 'quick jumped lazy';


    bazz
    "The day you stop learning is the day you become obsolete"! - my late Dad.

    Why do some people say "I don't know for sure"? If they don't know for sure then, they don't know!
    Useful MySQL resource
    Useful MySQL link

  • #2
    Master Coder
    Join Date
    Apr 2003
    Location
    in my house
    Posts
    5,211
    Thanks
    39
    Thanked 201 Times in 197 Posts
    OK, this gets me the (quick) split across $1 $" $3 but, not the other bracketed words.

    [code]
    if ($keywords =~ /(\()([\w\ ]+)(\))+/) { print qq( s1=$1 s2=$2 s3=$3 s4=$4 ); }

    [code]
    "The day you stop learning is the day you become obsolete"! - my late Dad.

    Why do some people say "I don't know for sure"? If they don't know for sure then, they don't know!
    Useful MySQL resource
    Useful MySQL link

  • #3
    New Coder
    Join Date
    Mar 2009
    Location
    Fabric Covered Box
    Posts
    69
    Thanks
    1
    Thanked 16 Times in 14 Posts
    Your first try was pretty close -- just needed your repetition in the right spot.

    @words= 'The (quick) brown fox (jumped) over the (lazy) dog'=~/\(([^)]+)\)/g;

    or

    @words= 'The (quick) brown fox (jumped) over the (lazy) dog'=~/\((.*?+)\)/g;

  • #4
    Master Coder
    Join Date
    Apr 2003
    Location
    in my house
    Posts
    5,211
    Thanks
    39
    Thanked 201 Times in 197 Posts
    Thanks Shannon, I'll take a look at that closely.

    In the meantime I had mushed this together which works but the regex is much tidier.
    Code:
    my $keywords = $description;
      $keywords =~ s/'//g;
      $description =~ s/\/\(//g;
      $description =~ s/\)\///g;
      my @search_terms;
      my @array = split ( '/' , $keywords);
      my $count=0;
    
      foreach my $word (@array)
      {
      $count++;
     
        #print qq( word = $word <br /> );
        if ($word =~ /^(\([-\w\'\ ]+\))$/ )
        {
    
        my ($keep,$discard) = split /\// , $word, 2;
        $keep =~ s/\(//;
        $keep =~ s/\)//;
        #print qq(keep = $keep);
        push (@search_terms, $keep);
          
        }
      }
    bazz
    "The day you stop learning is the day you become obsolete"! - my late Dad.

    Why do some people say "I don't know for sure"? If they don't know for sure then, they don't know!
    Useful MySQL resource
    Useful MySQL link

  • #5
    New Coder
    Join Date
    Mar 2009
    Location
    Fabric Covered Box
    Posts
    69
    Thanks
    1
    Thanked 16 Times in 14 Posts
    nerh?

    From that code it looks like the keywords are delimited by /( and )/, not plain parentheses? You kill all the single quotes in $keywords then allow them in the $word match? Then split on a slash after a match that excludes slashes?

    I'm all confusified now.

  • #6
    Master Coder
    Join Date
    Apr 2003
    Location
    in my house
    Posts
    5,211
    Thanks
    39
    Thanked 201 Times in 197 Posts
    o-o-h the stripped out bit is sloppy.

    I'm going back to the regex idea so only the () will be used and not the /( and )/.

    Thanks for your answer.

    bazz
    "The day you stop learning is the day you become obsolete"! - my late Dad.

    Why do some people say "I don't know for sure"? If they don't know for sure then, they don't know!
    Useful MySQL resource
    Useful MySQL link

  • #7
    Master Coder
    Join Date
    Apr 2003
    Location
    in my house
    Posts
    5,211
    Thanks
    39
    Thanked 201 Times in 197 Posts
    confused myself now. again

    this first one works technically except that it prevents me from using brackets in normal text.

    Code:
    my @words = $keywords =~ /\(([^)]+)\)/g;
    If I change it to this, it does not work but does not error
    Code:
    my @words = $keywords =~ /\({[^}]+}\)/g;
    what I really want is to be able to use a symbol which would not be used in english text. Something like |

    my @words = $keywords =~ /\(|[^|]+|\)/g;

    that doesn't capture the words instead, it captures the whole paragraph.
    And this captures the whole pargraph too.

    Code:
    my @words = $keywords =~ /\(|[^|]+\)/g;
    so here is my breakdown of the regex.

    / = regex limiter
    \( = regex boundary
    | = first item to match on
    [ = start of char class
    ^ = find the first occurrence of the following char
    | = the following char to match on (mentioned in the above point)
    ] = end of char class
    + = 1 or more occurrences
    \) = end regex boundary
    / = regex end limiter
    g = make it all global
    ; = end of line


    Any pointers most welcome.

    bazz
    Last edited by bazz; 07-18-2009 at 02:36 AM.
    "The day you stop learning is the day you become obsolete"! - my late Dad.

    Why do some people say "I don't know for sure"? If they don't know for sure then, they don't know!
    Useful MySQL resource
    Useful MySQL link

  • #8
    New Coder
    Join Date
    Mar 2009
    Location
    Fabric Covered Box
    Posts
    69
    Thanks
    1
    Thanked 16 Times in 14 Posts
    my @words = $keywords =~ /\(|[^|]+\)/g;

    would break down as

    / = regex delimiter
    \( = open parenthesis (the backslash escapes the begin-capture meaning)
    | = alternation (or)
    [ = begin character class
    ^ = invert character class
    | = literal pipe character
    ] = end character class
    [^|] = anything but a pipe character
    + = repeat 1 or more times, greedily
    \) = literal close paren.
    / = end regex

    So, that would match an open paren OR a group of one or more non-pipe characters followed by a close paren.

    To use you choice of keyword marker:
    Code:
    my ($beginKey,$endKey)=qw{ | | };
    my $s = "The |quick| brown fox |jumped| over the |lazy| dog";
    my @keywords= $s=~/$beginKey(.*?)$endKey/g;
    
    # if you want to strip the keyword delimiter out of the original string
    ($beginKey,$endKey)=qw( -={ }=- );
    $s = "The -={quick}=- brown fox -={jumped}=- over the -={lazy}=- dog";
    @keywords=();
    $s=~s/$beginKey(.*?)$endKey/push(@keywords,$1),$1/eg;

  • #9
    Master Coder
    Join Date
    Apr 2003
    Location
    in my house
    Posts
    5,211
    Thanks
    39
    Thanked 201 Times in 197 Posts
    Blimey, I was way off.

    Thanks for your response. I'll study it to try to mame sense of it.


    bazz
    "The day you stop learning is the day you become obsolete"! - my late Dad.

    Why do some people say "I don't know for sure"? If they don't know for sure then, they don't know!
    Useful MySQL resource
    Useful MySQL link


  •  

    Posting Permissions

    • You may not post new threads
    • You may not post replies
    • You may not post attachments
    • You may not edit your posts
    •