...

View Full Version : problem with a simple regular expression in php



student
02-19-2010, 06:28 AM
hello,
I have this example code :

$text='Business business? business. business? business, abusinessman business woman <b>business</b> [b ]business[/ b] business.';

$new=preg_replace('/business/i', 'biziness', $text);

echo $new.'<br>';

This replaces every occurrence of business with biziness.

However, I don't want it to replace "abusinessman", <b>business</b> and [b ]business[/ b].

I don't want to replace the word business if it followed or preceded by another alphanumeric character or < > or [ ]

I want to replace it if it is followed or preceded by spaces, comma, full stop, quotation or quote, new line, tab etc.


How to do this?
I am new to php regular expressions.

Thank you very much for reading and replying!

bdl
02-19-2010, 06:38 AM
I'd use a "word boundary" assertion, e.g.


$regex= '/\b(business)\b/i';
$new= preg_replace($regex, 'biziness', $text);


PCRE (http://www.pcre.org/pcre.txt)

student
02-19-2010, 06:48 AM
I'd use a "word boundary" assertion, e.g.


$regex= '/\b(business)\b/i';
$new= preg_replace($regex, 'biziness', $text);


PCRE (http://www.pcre.org/pcre.txt)


hello,
Thank you for your reply.
I have tried your code and it converted every 'business' to 'biziness' except 'abusinessman'.
So, it also replaced <b>business</b> and [ b ]business[ /b]

So, I think you have missed something in the code.
Please check again.

Thank you.

SKDevelopment
02-19-2010, 07:28 AM
Then specially for this case you could use the following. I have added a lookbehind negative assertion (?<! ).


<?php
$text='Business business? business. business? business, abusinessman business woman <b>business</b> [b ]business[/ b] business.';

$new=preg_replace('/(?<!>|\])\bbusiness\b/i', 'biziness', $text);

echo $new.'<br>';
?>

Please notice: for more general cases you would need more complicated regular expressions. This regexp has been suggested only for the string you have shown. If you need something more general, please ask questions.

student
02-19-2010, 07:38 AM
Then specially for this case you could use the following. I have added a lookbehind negative assertion (?<! ).


<?php
$text='Business business? business. business? business, abusinessman business woman <b>business</b> [b ]business[/ b] business.';

$new=preg_replace('/(?<!>|\])\bbusiness\b/i', 'biziness', $text);

echo $new.'<br>';
?>

Please notice: for more general cases you would need more complicated regular expressions. This regexp has been suggested only for the string you have shown. If you need something more general, please ask questions.

Thanks a lot.
It seem to work.
I will reply if I find any problem with the code.

student
02-19-2010, 07:47 AM
Hi,
I just now discovered that the code you gave above is converting the following the cases :

<b>online business</b> or [b ]online business[ /b]
or
<b>business online</b> or [b ]business online[ /b]

I don't want the code to modify text in the above cases.

Please check and suggest the correct code.

Thank you

SKDevelopment
02-19-2010, 08:28 AM
As I said it was for the string you have shown only. The new regexp I am giving now uses also lookahead negative assertion (?! ) to take into account more cases you have listed. But still again, it is only for the particular cases met in the new string:


<?php
$text='Business business? business. business? business, abusinessman business woman <b>business</b> [b ]business[/ b] <b>online business</b> or [b ]online business[ /b] or <b>business online</b> or [b ]business online[ /b] business.';

$new=preg_replace('/(?<!>|\])\bbusiness\b(?!<|\[)/i', 'biziness', $text);

echo $new.'<br>';
?>

For more general case you could need more complicated regexp. So please ask questions if you could possibly need anything more complicated.

For reference: You could also read about the assertions used in the Manual here (http://www.php.net/manual/en/regexp.reference.assertions.php).

DaiWelsh
02-19-2010, 08:38 AM
Try


$new=preg_replace('/(^|\W)?business(\W|$)/i', '$1biziness$2', $text);

which is basically

start of data or non word-character(optional)
business
end of data or non word-character
ignore case

making the bit before the word optional is necessary for where the patterns would overlap e.g. "business business" where the space would be part of the first pattern and so not eligible as part of sedcond patern (there may e another way to achieve this, but...

If you find some of the things you want as non-word don't match \W you can replace with a character class e.g. [\W\-_] and if \W includes something you don't want as a word boundary then roll your own e.g. [\s<\[\.,:;]

Regards,

Dai

SKDevelopment
02-19-2010, 08:50 AM
2 DaiWelsh:



$new=preg_replace('/(^|\W)?business(\W|$)/i', '$1biziness$2', $text);
No, this would replace "business" anywhere except "abusinessman". Sorry, I am not providing a modification to your solution...

2 Student: solution in post #7 has been checked by me and works fine.

DaiWelsh
02-19-2010, 09:04 AM
Apologies, you are right, I did not read the original request properly and did what seemed logical, my bad :(

DaiWelsh
02-19-2010, 09:08 AM
It seems to me the request is very odd now I re-read - my thought is that the next thing will be that they don't want it to replace <b>any business online</b> as that seems to be the logical extension (unless this is an arbitrary classroom exercise ofc :)

Anyway your solution is correct for the problem as currently stated, awaiting next installment.. :rolleyes:

SKDevelopment
02-19-2010, 09:26 AM
2 DaiWelsh: Yes, you are right. For <b>any business online</b> and other similar cases we would need to provide a more complicated regexp of course. I hope the OP would tell us if this could be necessary ... I think it could be (this is why I am asking for more questions in every post). You are clearly an experienced programmer. And of course you see this is the future development of the problem.

I thought it could be possibly some learning exercise (which is fine for forums of course). So I answered without taking into account not listed cases which could be more complicated ... Just to avoid any possible confusion... Simple patterns are simpler to learn and understand ...

Yes, of course you are right. We would have to wait for an answer from the OP before providing more help I am afraid ...

2 Student: I am sorry for this post which could possibly be a small off-topic in your thread. If there are more cases which need to be taken into account, could you ask questions please ?

student
02-19-2010, 09:47 AM
Hello bdl, SKDevelopment and DaiWelsh,
Thank you for your replies.

According to my present requirement, the code presented by SKDevelopment worked well.

However, I think I may also need to avoid converting the 'business' inside tags like this:

<b>any business online</b>
or
[ b]any business online [ /b]

Can you please suggest modified code ?

Thanks a lot.

DaiWelsh
02-19-2010, 11:06 AM
lol, was that a deliberate troll? If not I think you need to describe what your real objective is rather than drip-feeding test cases like this as you are wasting everyone's time.

Regards,

Dai

student
02-19-2010, 11:28 AM
lol, was that a deliberate troll? If not I think you need to describe what your real objective is rather than drip-feeding test cases like this as you are wasting everyone's time.

Regards,

Dai

Hello Dai,
I am extremely sorry for wasting your time.

Here is the purpose :
I have a phpbb3 forum.

I am trying to convert every occurrence of some keywords in the posts into links.

I have a business forum.
So, whenever the word 'business' is used in the post, I would like to link it to one of my website link.

So, it may be good to avoid replacing the text inside bbcode.

Let me know if you would like to know more information.

Since I am writing the code for the function myself, I could not foresee it's requirements.

When I read your reply, I thought that I may need to implement your suggestion.

So, I replied again requesting modification in the code.

Thank you.

DaiWelsh
02-19-2010, 11:58 AM
Ok, that is more like it, we don't mind you asking the question, we all choose to be here and help, it is just always better to say what you are trying to achieve rather than reduce the question to something too specific without context.

Given this scenario there are many more things to consider - what about "business" inside a link URL or as a CSS class name or.... for this scenario you will really need to parse the document as HTML (or possibly as XML if it validates as such) and then replace the word only if it is inside a suitable node (i.e. not in an attribute or similar).

Have you tried googling for this, it seems like a common requirement I would expect there to be some code samples out there ready made?

This can also be done with javascript as an alternative (though not saying I recommend this, just that you may find it easier to find existing code for that, not sure).

Just out of curiosity, in this scenario, why did you not want it substituting links inside [b] tags?

Regards,

Dai

P.S.

http://www.phpclasses.org/browse/package/3625.html is first link on google for php linking keywords, you can either use that or if you still want to do it yourself check their code for how they do it.

student
02-27-2010, 03:29 AM
hello DaiWelsh,
Thank you for your reply.

I will search online for related scripts or code.

You wrote :
"Just out of curiosity, in this scenario, why did you not want it substituting links inside [ b ] tags?"

The code I am using at present to convert 'business' to 'biziness' inside the $text is :
$new=preg_replace('/(?<!\+|>|\])business\b(?!\+|<|\[)/i', 'biziness', $text);

So, this is already not converting 'business' inside any ']' and '[' like
[ b]business[ /b]

my problem actually is how to make it not convert the 'business' inside
[ b]online business guide[ /b]
or
< b>online business guide< /b>

If anybody know how to do this, please reply.
Thank you.



EZ Archive Ads Plugin for vBulletin Copyright 2006 Computer Help Forum