...

View Full Version : regular expressions aka argh!



hiremdm
03-23-2007, 02:57 AM
Okay, I want to find within a string any URLs, like ...

http://myworld.ebay.co.uk/pipwish

and turn them into this ...

<a href="http://myworld.ebay.co.uk/pipwish" target="_blank">[ Link to ebay.co.uk ]</a>

I almost have it, but am tripping up on showing JUST the domain inside the Link to brackets.

Any help???

iLLin
03-23-2007, 03:22 AM
What code are you using and what output are you getting?

hiremdm
03-23-2007, 06:06 AM
Here's what I have so far ...


$Text="I like http://cgi.ebay.co.uk/F-ck-Graphic-Design-T-shirt-white-SMALL-mens-cool_W0QQitemZ220094569717QQcategoryZ313QQssPageNameZWDVWQQrdZ1QQcmdZViewItem, http://www.google.com, and I connect through FTP to my site through ftp://www.mysite.com.";
$strText = preg_replace( '/(http|ftp)+(s)?:(\/\/)((\w|\.)+)(\/)?(\S+)?/i', '<a href="\0">[ Link to \4 ]</a>', $Text );
echo $strText;

What you'll notice in the output is that the link that's put into the <a> tag includes the punctuation (comma, period) following the original link. This may be a case of wanting to have my cake and eat it too ... but it seems to me I ought to be able to truncate that last character or at least disregard a comma or period, which should not end a URL (though they may be inside one, I guess) ...

ugh

hiremdm
03-23-2007, 05:19 PM
Just bumping my thread for any fresh ideas ... I know this is simple stuff, but I'm just not a regex guy ... any help is greatly appreciated!

aedrin
03-23-2007, 06:25 PM
I'm not an expert at reading regular expresions ;) If I look at mine after a week or two, I'll go crazy!

Anyway.

Pass in a $matches variable to preg_match(), print_r it and see what it puts out. Maybe you'll see that you're using a different reference?

You'll want to have seperate captures for each section (| marks the section):

http:// | www.domain.com | /path/to/file.php

Then it should be relatively easy. ;)

ralph l mayo
03-24-2007, 02:15 AM
Pilfered from CPAN:



(?:(?:http)://(?:(?:(?:(?:(?:(?:[a-zA-Z0-9][-a-zA-Z0-9]*)?[a-zA-Z0-9])[.])*(?:[a-zA-Z][-a-zA-Z0-9]*[a-zA-Z0-9]|[a-zA-Z])[.]?)|(?:[0-9]+[.][0-9]+[.][0-9]+[.][0-9]+)))(?::(?:(?:[0-9]*)))?(?:/(?:(?:(?:(?:(?:(?:[a-zA-Z0-9\-_.!~*'():@&=+$,]+|(?:%[a-fA-F0-9][a-fA-F0-9]))*)(?:;(?:(?:[a-zA-Z0-9\-_.!~*'():@&=+$,]+|(?:%[a-fA-F0-9][a-fA-F0-9]))*))*)(?:/(?:(?:(?:[a-zA-Z0-9\-_.!~*'():@&=+$,]+|(?:%[a-fA-F0-9][a-fA-F0-9]))*)(?:;(?:(?:[a-zA-Z0-9\-_.!~*'():@&=+$,]+|(?:%[a-fA-F0-9][a-fA-F0-9]))*))*))*))(?:[?](?:(?:(?:[;/?:@&=+$,a-zA-Z0-9\-_.!~*'()]+|(?:%[a-fA-F0-9][a-fA-F0-9]))*)))?))?)


Remove the first ?: to capture the match

kaisellgren
03-24-2007, 12:25 PM
ralph that looks way too complicated and slow imo :/

hiremdm
03-24-2007, 11:20 PM
Okay ... this ...


(([\w]+:)?//)?(([\d\w]|%[a-fA-f\d]{2,2})+(:([\d\w]|%[a-fA-f\d]{2,2})+)?@)?([\d\w][-\d\w]{0,253}[\d\w]\.)+[\w]{2,4}(:[\d]+)?(/([-+_~.\d\w]|%[a-fA-f\d]{2,2})*)*(\?(&?([-+_~.\d\w]|%[a-fA-f\d]{2,2})=?)*)?(#([-+_~.\d\w]|%[a-fA-f\d]{2,2})*)?

works flawlessly in this on-line validator when using the Javascript option ... but does nothing with preg or ereg ...

http://www.regextester.com/

Here's my test string ...

I like http://cgi.ebay.co.uk/F-ck-Graphic-Design-T-shirt-white-SMALL-mens-cool_W0QQitemZ220094569717QQcategoryZ313QQssPageNameZWDVWQQrdZ1QQcmdZViewItem, http://www.google.com, and I connect through FTP to my site through ftp://www.mysite.com.

ralph l mayo
03-25-2007, 06:09 AM
ralph that looks way too complicated and slow imo :/

“For every complex problem, there is a solution that is simple, elegant and wrong.”

That monstrocity matches per the RFCs. I'd use it unless profiling of the actual application indicated a faster naive solution would be worth the potential for error.



EZ Archive Ads Plugin for vBulletin Copyright 2006 Computer Help Forum