...

View Full Version : Trickier regex - definitely stuck



RabidMango
07-20-2009, 05:18 PM
Okay, my first experiment self-lesson worked fine...



$_="do re mi";
print if /^([^ ]+) +(\1)/;

$_="ra ra ra";
print if /^([^ ]+) +(\1)/;


prints out ra ra ra, but not do re mi

but what i want to do is if it is "b b c" print it and if it is "k k k" not print it...


I tried this and it buggered up...




$_="b b c";
print if /^([^ ]+) +(\1) +([^\1])/;


$_="k k k";
print if /^([^ ]+) +(\1) +([^\1])/;



the output was...



perl woo3
b b ck k k


So how can I achieve what I want, i.e. make it so that it has to be

sheep sheep cow

and not

sheep sheep sheep?

Presumably there's got to be a way of using the \1 to solve it? Anyone?

this code


$_="b b c";
print if /^([^ ]+) +(\1) +[^\1]/;


$_="k k k";
print if /^([^ ]+) +(\1) +[^\1]/;

had the same result, so I guess one can leave it at that, the other brackets were superfluous to my failed attempt. So can anyone correct this latest piece of code... how can I preclude kkk and allow bbc?

I see why it's wrong, ^ apparently is for saying no to a character, not any more than that

so is there a way to say no to the entire word contained in \1

KevinADC
07-20-2009, 08:51 PM
For a single character the negated character class could work, but not for a "word" because whatever is inside the character class is not matched in any order. None of the character classes (called short cut character classes) are matched in any order, including \w \W \s and etc.

What you want is a look ahead assertion, more formally called a zero-width look ahead assertion. I will provide you a link instead of some code because it looks like you are good at solving your own questions:

http://perldoc.perl.org/perlretut.html#Looking-ahead-and-looking-behind

RabidMango
07-20-2009, 10:46 PM
Thanks for that. I went there, read what you suggested, had a go with a ?= and found a way to do it...



$_="b b c";
print if /^([^ ]+) +(\1) (?=\1)/;


$_="k k k";
print if /^([^ ]+) +(\1) (?=\1)/;



produces just kkk (and i wanted bbc) so...



$_="b b c";
print unless /^([^ ]+) +(\1) (?=\1)/;


$_="k k k";
print unless /^([^ ]+) +(\1) (?=\1)/;


should do the trick...

I'll just test it (if it fails I'll edit the whole post, so this isn't really live, I'm just going through the motions)...



perl woo6
b b c


nice, it worked.

RabidMango
07-21-2009, 01:16 AM
$_="b b c";
print unless /^([^ ]+) +(\1) \1/;


$_="k k k";
print unless /^([^ ]+) +(\1) \1/;

(produces b b c as the only output)

also works fine, and is clearly preferable... looks like I should hurry up and get back to reading about what in davy crocket ?= actually is for, since whatever I tried to do with it was pointless, it works entirely the same with just the \1

in fact i reckon i can clean it up some more...



$_="b b c";
print unless /^([^ ]+) +(\1) \1/;


$_="k k k";
print unless /^([^ ]+) +(\1) \1/;



wait a minute, my solution is wrong...

it shouldn't match a b c, but it does, alas
obviously it's the way i've used unless - i've turned the universe on its head and everything's fallen down

here's the right solution



$_="a b c";
if (/^([^ ]+) +(\1)/){
print unless /^([^ ]+) +(\1) \1/;
}

$_="b b c";
if (/^([^ ]+) +(\1)/){
print unless /^([^ ]+) +(\1) \1/;
}

$_="k k k";
if (/^([^ ]+) +(\1)/){
print unless /^([^ ]+) +(\1) \1/;
}



for now - only one line longer (the replication of the process 3 times is just for the experimentation, in practise that would not be there)



$_="a b c";
if (/^([^ ]+) \1/){
print unless /^([^ ]+) \1 \1/;
}

$_="b b c";
if (/^([^ ]+) +(\1)/){
print unless /^([^ ]+) \1 \1/;
}

$_="k k k";
if (/^([^ ]+) +(\1)/){
print unless /^([^ ]+) \1 \1/;
}


there it is, cleaned up even more and now one more clean to minimize the space used...



$_="a b c";
if (/^(\w+) \1/){
print unless /^(\w+) \1 \1/;
}

$_="b b c";
if (/^(\w+) \1/){
print unless /^(\w+) \1 \1/;
}

$_="k k k";
if (/^(\w+) \1/){
print unless /^(\w+) \1 \1/;
}

RabidMango
07-21-2009, 01:31 AM
I haven't understood how to use the lookahead and lookbehind wotsits yet, clearly. Will have to deal with that tomorrow. At least I found a way to do what I wanted, though. Shame it took two lines of perl instead of one. Maybe I'll figure out how to do it in one soon.

RabidMango
07-21-2009, 02:58 PM
Someone showed me the one I wanted, I thought I'd tried it, but I'd obviously got it a bit wrong...



$_="a b c";
print if /^([^ ]+) +(\1) +(?!\1)/;

$_="b b c";
print if /^([^ ]+) +(\1) +(?!\1)/;

$_="k k k";
print if /^([^ ]+) +(\1) +(?!\1)/;



yep the output worked as I wanted it. nice.

I've neatened it up to this, and it still works fine (and only outputs b b c)...


$_="a b c";
print if /^([^ ]+) \1 (?!\1)/;

$_="b b c";
print if /^([^ ]+) \1 (?!\1)/;

$_="k k k";
print if /^([^ ]+) \1 (?!\1)/;


(or indeed /^(\w+) \1 (?!\1)/ which I tested and it works fine)

Thanks for all assistance

NB the solution with if/unless is much better in terms of efficiency of processing, I am told - lookahead is "computationally expensive" (it can get expensive looking ahead at every char pos in a large string), so bear that in mind anyone.



EZ Archive Ads Plugin for vBulletin Copyright 2006 Computer Help Forum