View Full Version : RegEx - Make HTML Lower Case Except Attribute Values

02-24-2007, 08:23 PM
I'm in the middle of writing a function at the moment that uses a series of regular expressions to clean up some dynamically generated HTML from a WYSIWYG editor but I'm having a problem getting the HTML to convert to lower case correctly.

I have the following expression to convert the tags to lower case:

$strHTML = preg_replace("/(<[^>]+>)/ies", "strtolower('$1')", $strHTML);
Naturally of course this changes everything about the tag to lower case, the tag itself, the attributes and the attribute values which is all good except that I dont want it to convert the attribute value to lower case.

For instance if I have the following tag:

<abbr title="What You See Is What You Get">WYSIWYG</abbr>
I'd need the abbr opening and closing tag as well as the title attribute to be lower case, but the actual value of the title attribute "What You See Is What You Get" to remain as it is without being touched by the expression. Thats what's giving me a headache since it needs to work for any tag and attribute.

Been Google'ing this for hours and I can't find anything that can help, anyone have any ideas?

Thanks in advance.

02-25-2007, 01:52 AM
Actually nevermind, I've figured it out, I modified the regular expression line to use preg_replace_callback:

$strHTML = preg_replace_callback("/(<[^>]+>)/i", "lowerCaseHTML", $strHTML);And then wrote the following function:

function lowerCaseHTML($Matches) {

if (preg_match("/<([^>]+)(\s\w+)=([^>]+)>/i", $Matches[1], $NewMatch)) {
return "<" . strtolower($NewMatch[1]) . strtolower($NewMatch[2]) . "=" . $NewMatch[3] . ">";

} else {
return strtolower($Matches[1]);


}Seems to be working fine.