PDA

View Full Version : Regex Help


NancyJ
07-03-2006, 12:02 PM
I'm creating an email marketting manager for work and we want to be able to track links in from the email so I want to find links in the plain Text and HTML parts of an email and append ?m=emailID&l=linkid but preferably not add them on if they're already there and add &m=emailID&l=linkid if there is already a query string on the link. Originally when I thought about it it seemed simple, but now thinking about the smaller details it seems much more difficult.

marek_mar
07-03-2006, 12:55 PM
<?php
$subject = '
sdfgsdgf
http://www.codingforums.com/index.php

http://www.foo.com/index.php?something=aeg&asfas=degfs

ftp://no.com/dl.php?m=emailID&l=linkid
';

$regex = '~(?>[a-z+]{2,}://|www\.)(?:[a-z0-9]+(?:\.[a-z0-9]+)?@)?(?:(?:[a-z](?:[a-z0-9]|(?<!-)-)*[a-z0-9])(?:\.[a-z](?:[a-z0-9]|(?<!-)-)*[a-z0-9])+|(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?))(?:/[^\\/:?*"<>|\n]*[a-z0-9])*/?(?:(\?)[a-z0-9_.%]+(?:=[a-z0-9_.%:/+-]*)?(?:&(?:(l)|[a-z0-9_.%]+)(?:=[a-z0-9_.%:/+-]*)?)*)?~i';

function append_querystring($matches)
{
if(isset($matches[2]))
{
return $matches[0];
}
if(isset($matches[1]))
{
return $matches[0] . '&m=emailID&l=linkid';
}
return $matches[0] . '?m=emailID&l=linkid';
}

print preg_replace_callback($regex, 'append_querystring', $subject);
?>

NancyJ
07-03-2006, 01:13 PM
...wow. Not tried it yet but wow. Cheers.

NancyJ
07-06-2006, 11:46 AM
OK so this is working great but I need to make a slight modification but what does the ~ character mean? I've looked up loads of regex resources and I dont find any reference to it.

fci
07-06-2006, 01:03 PM
the code is using ~$pattern~ instead of /$pattern/ .. that is all. there is probably a more clear explanation somewhere but you have it be enclosed in characters other then forward slashes.

GJay
07-06-2006, 01:16 PM
It's just the delimiter, as you'd use / or # or % or whatever.

NancyJ
07-06-2006, 01:16 PM
the code is using ~$pattern~ instead of /$pattern/ .. that is all. there is probably a more clear explanation somewhere but you have it be enclosed in characters other then forward slashes.
cheers :) That explains it perfectly

marek_mar
07-06-2006, 01:52 PM
I used "~" becouse I was sure it didn't come up in the regex. If I'd use some character that is inside the regex I'd have to escape it...

NancyJ
07-06-2006, 02:39 PM
Ok I'm going to have to admit defeat here. I'm totally boggled with this regex, I'm trying to modify it but I'm going cross-eyed trying to take it all in.
Basically the problem I have is that I want to change it so that http://www.domain.com is a match and http://domain.com matches but www.domain.com doesnt match.
And so that <a href = "http://www.domain.com"> will work but <img src = "http://www.domain.com/image.jpg"> wont work - but so that http://www.domain.com will still work..

This is giving me a headache.

There is also a little problem that in the guy that writes the emails uses &amp; rather than & in the querystrings.

I dont expect you to do this for me - you've already done more than I ever expected but if you could maybe explain the regex a little, then I can fix it myself.

Mwnciau
07-06-2006, 02:52 PM
I found this (http://regular-expressions.info/) a good website to explain regex. Hope it helps.

NancyJ
07-06-2006, 02:55 PM
I found this (http://regular-expressions.info/) a good website to explain regex. Hope it helps.
I've read through that site many times and on many occasions, but it doesnt cover anytihng quite as advanced/mammoth as this

marek_mar
07-06-2006, 05:24 PM
Well I havent written this regex especially for you. :) I've modified the regex I posted in the regex thread (http://www.codingforums.com/showthread.php?t=76949) (may explain why it is so complex) and just modified it so it would check for the things you wanted. I stripped a feature off of it but I hope you won't mind (matching the anchors). I can include that if you need it.
Anyways I've edited it out so that it won't match url's to *.jpg, *.gif or *.png files.

<?php
$regex = '~(?>[a-z+]{2,}://|www\.)(?:[a-z0-9]+(?:\.[a-z0-9]+)?@)?(?:(?:[a-z](?:[a-z0-9]|(?<!-)-)*[a-z0-9])(?:\.[a-z](?:[a-z0-9]|(?<!-)-)*[a-z0-9])+|(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?))(?:/[^\\/:?*"<>|\n]*[a-z0-9])*/[^\\/:?*"<>|\n]+\.(?!jpg|gif|png)[a-z0-9]+/?(?:(\?)[a-z0-9_.%]+(?:=[a-z0-9_.%:/+-]*)?(?:(?:&(?:amp;)?(l)|[a-z0-9_.%]+)(?:=[a-z0-9_.%:/+-]*)?)*)?~i';
?>

I hope I didn't break it :)
I did break it...

marek_mar
07-06-2006, 05:55 PM
Ok doing it with regex was too annoying... the regex won't match a few valid urls's (these that have paths like "/my.folder.has.many.dots/file.php") but it sdhould work on all others.

<?php
$subject = '
sdfgsdgf
http://www.codingforums.com/index.php

http://www.foo.com/index.php?something=aeg&asfas=degfs

http://www.foo.com/image.gif

ftp://no.com/dl.php?m=emailID&l=linkid
';

$regex = '~(?>[a-z+]{2,}://|www\.)(?:[a-z0-9]+(?:\.[a-z0-9]+)?@)?(?:(?:[a-z](?:[a-z0-9]|(?<!-)-)*[a-z0-9])(?:\.[a-z](?:[a-z0-9]|(?<!-)-)*[a-z0-9])+|(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?))(?:/[^\\/:?*"<>.|\n]*(\.[a-z0-9]+)?)*/?(?:(\?)[a-z0-9_.%]+(?:=[a-z0-9_.%:/+-]*)?(?:(?:&(?:amp;)?(l)|[a-z0-9_.%]+)(?:=[a-z0-9_.%:/+-]*)?)*)?~i';

function append_querystring($matches)
{
if(isset($matches[1]))
{
switch($matches[1])
{
case '.gif':
case '.jpg':
case '.png':
return $matches[0];
default:
break;
}
}
if(isset($matches[3]))
{
return $matches[0];
}
if(isset($matches[2]))
{
return $matches[0] . '&m=emailID&l=linkid';
}
return $matches[0] . '?m=emailID&l=linkid';
}

print preg_replace_callback($regex, 'append_querystring', $subject);
?>

Brandoe85
07-06-2006, 05:59 PM
I hope I didn't break it :)
I did break it...

Heheh :D