View Full Version : Regex help please

09-17-2003, 05:14 AM
I am regex challanged , there I said it , that feels better :D

<tAblE bgcolor=BLUE BORDER=0 width=100 >

I am parsing what should be valid xhtml but it aint always so , I can handle most things apart from unquoted attributes 1 attribute I can handle
( albeit in an ametuerish way
preg_replace("|<(.*) (.*)=(.*)>|U" , "<$1 $2=\"$3\">");
but dont have a clue how to handle multiple instances in the same string , eg with the above the output should be ..

<table bgcolor="blue" border="0" width="100">

can this be handled in one regex statement or should I be using preg_replace_callback etc ?

anyone want to share ? :D

also an explanation of how to handle optional bits like the fact that there may or may not be a whitespace before the closing tag <table blah=blah >

I am getting though it to a point , but with a lot of str_replace()ing and messy stuff which in this case is probably less efficient than a decent bit of regex.

09-17-2003, 10:35 AM
Writing parsers seems to be popular at the moment. I would use preg_replace_callback() to do the hard work for you, as it provides you with more flexibility regarding the replacement part. Here's my suggestion:

$html = '<tAblE bgcolor=BLUE BORDER=0 width=100 > ';

function tidyAttribute($tokens) {
$out = '';
switch (count($tokens)) {
case 2:
$out .= '<' . strtolower($tokens[1]);

case 4:
$out .= " " . strtolower($tokens[2]) . '="';
$out .= strtolower($tokens[3]) . '"';
return $out;

$html = preg_replace_callback(
<(\w+)\s # tagname
|(?:(?:\s*) # starting whitespace
(\S+?)= # attribute name
([^>\s"\\']+) # attribute value
(?:\s*)) # trailing whitespace

print $html;

09-17-2003, 02:37 PM
Mordred thats excellent thank you !

I would love to add I have learnt from the experience and would be able to do it myself tomorrow ....

I would be fibbing :D

(but I am trying) Cheers again.