...

View Full Version : {PCRE} Nested [quote]s to be replaced with <blockquote>s



Vin0rz
05-04-2006, 06:35 PM
Hey, I'd like my visitors to be able to quote eachother's replies. Now, they have the
tag available and it is replaced by <blockquote> by code like the following:




$somereply = 'Hey, I feel like
quoting someone that is [quote]quoting
someone, cool, eh?.';

$somereply = preg_replace('[\[quote\](.*)\[/quote\]]iUs',
'<blockquote>$1</blockquote>',
$somereply);



Now, this basically works for normal quotes, however, when quotes are nested as in the above example, it is totally messed up. How can I fix this?

Thanks in advance.

ralph l mayo
05-04-2006, 08:30 PM
afaik you can't do arbitrary level recursion purely with a regular expression, at least as it's implemented here. You'll need to loop over your already correct expression.



$regex = '/\[quote\](.*)\[\/quote\]/usi';
for (; preg_match($regex, $somereply); $somereply = preg_replace($regex, '<blockquote>$1</blockquote>', $somereply));


If you don't like the for abuse you can do the same thing with while.

trib4lmaniac
05-05-2006, 09:54 AM
$regex = '/\[quote\](.*)\[\/quote\]/usi';

This would match:



First quote
some text

Second quote

ralph l mayo
05-05-2006, 03:57 PM
You're right, it needs a lazy quantifier.

$regex = '/\[quote\](.*?)\[\/quote\]/usi';

Vin0rz
05-05-2006, 06:28 PM
You're right, it needs a lazy quantifier.

$regex = '/\[quote\](.*?)\[\/quote\]/usi';

Isn't that just the same as

'/\[quote\](.*)\[\/quote\]/si';

fci
05-05-2006, 06:35 PM
vin0rz,

By default, the quantifiers are "greedy", that is, they match as much as possible (up to the maximum number of permitted times), without causing the rest of the pattern to fail. The classic example of where this gives problems is in trying to match comments in C programs. These appear between the sequences /* and */ and within the sequence, individual * and / characters may appear. An attempt to match C comments by applying the pattern /\*.*\*/ to the string /* first comment */ not comment /* second comment */ fails, because it matches the entire string due to the greediness of the .* item.

However, if a quantifier is followed by a question mark, then it ceases to be greedy, and instead matches the minimum number of times possible, so the pattern /\*.*?\*/ does the right thing with the C comments. The meaning of the various quantifiers is not otherwise changed, just the preferred number of matches. Do not confuse this use of question mark with its use as a quantifier in its own right. Because it has two uses, it can sometimes appear doubled, as in \d??\d which matches one digit by preference, but can match two if that is the only way the rest of the pattern matches.

http://us2.php.net/manual/en/reference.pcre.pattern.syntax.php

Vin0rz
05-05-2006, 06:46 PM
I know, but you're using the "U" modifier, which turns it around.

Edit: Sorry, you don't use the capital u.

ralph l mayo
05-05-2006, 11:40 PM
I don't even know what u/U is supposed to do, I just copied it without thinking I guess. But then I changed it. Who knows. Get rid of it and use the ?, I say.

trib4lmaniac
05-07-2006, 06:13 PM
I don't even know what u/U is supposed to do, I just copied it without thinking I guess. But then I changed it. Who knows. Get rid of it and use the ?, I say.
Sadly, this still wouldn't work for capturing the outer quote:

{quote}{quote}Inner quote.{/quote}Outer quote.{/quote]}

In case you're wondering, I don't have a solution :D
I once posted a similar problem on these boards (a year or two ago) for parsing nested {LIST} items. They're a little trickier than quotes, but it's the same principle.

ralph l mayo
05-08-2006, 03:05 AM
Sadly, this still wouldn't work for capturing the outer quote:

{quote}{quote}Inner quote.{/quote}Outer quote.{/quote]}


Sure it does, but the loop is needed.



$somereply = '

Inner quote.Outer quote.';
$regex = '/\[quote\](.*?)\[\/quote\]/i';
for (; preg_match($regex, $somereply); $somereply = preg_replace($regex, '<blockquote>$1</blockquote>', $somereply));


As I said, I don't think it's possible to do this correctly over N levels without looping the expression.

marek_mar
05-08-2006, 11:53 AM
It still would fail or match the open/close tags out of order. I'm pretty sure you need something more than just regex for this kind of stuff.

Vin0rz
05-08-2006, 02:35 PM
Looping worked, I omitted the ? and replaced the u with U. Thanks a lot!

trib4lmaniac
05-08-2006, 02:50 PM
You have matched the tags out of order. Simple, yet effective ;)

Match 1: {quote}{quote}Inner quote.{/quote}Outer quote.{/quote}
Replacement: <blockquote>{quote}Inner quote.</blockquote>Outer quote.{/quote}

Match 2: <blockquote>{quote}Inner quote.<blockquote>Outer quote.{/quote}
Replacement: <blockquote><blockquote>Inner quote.</blockquote>Outer quote.</blockquote>

I wonder if there are any situations when this would fail.

Vin0rz
05-08-2006, 03:00 PM
You have matched the tags out of order. Simple, yet effective ;)

Match 1: {quote}{quote}Inner quote.{/quote}Outer quote.{/quote}
Replacement: <blockquote>{quote}Inner quote.</blockquote>Outer quote.{/quote}

Match 2: <blockquote>{quote}Inner quote.<blockquote>Outer quote.{/quote}
Replacement: <blockquote><blockquote>Inner quote.</blockquote>Outer quote.</blockquote>

I wonder if there are any situations when this would fail.


Yeah I had figured that out too, and wondered about it too. I think there might be a problem in a case like quote}Here comes the{quote}Inner quote{/quote}but I has a typo in the opening{/quote}
But I think I couldn't care less, as typo's are a problem anyway.

marek_mar
05-08-2006, 03:43 PM
If there is an equal number or open/close tags it should work, but if there are missing opening/closing tags the wrong tags might be replaced.

ralph l mayo
05-08-2006, 04:06 PM
If there [QUOTE=marek_mar]is an equal [QUOTE=marek_mar]number or open/close [QUOTE=marek_mar]tags it should work, [QUOTE=marek_mar]but if there are missing [QUOTE=marek_mar]opening/closing tags [QUOTE=marek_mar]the wrong tags might be replaced.

/\ Looks like this forum uses something different, but I think our version is preferable; at least it would all be in a quote box. Anyway, handling cases with mismatched tags is a job for artificial intelligence, not regex :]

marek_mar
05-08-2006, 04:34 PM
... as long as you consider a code parser AI...

ralph l mayo
05-08-2006, 04:43 PM
How are you going to know where the user meant to place the missing tag without intelligent inference? If you're referring to something that just tacks missing tags on at the beginning/end, that's pretty trivial but it probably doesn't represent the user's intent anyway. Plz to be posting code parser tia~

marek_mar
05-08-2006, 05:36 PM
You missed my point. I was expecting somethiong like the forums syntax.



EZ Archive Ads Plugin for vBulletin Copyright 2006 Computer Help Forum