View Full Version : Encoding HTML elements inside CODE tags

10-10-2004, 04:40 AM
I've been working on a RSS parser for a couple of months now. Everything is peachy so far, except for two issues. I'll discuss one in this thread, and the other one in another thread.

When I parse through certain feeds, I come across <code> blocks, where the HTML code was written as HTML instead of being encoded. For example:

<code><span class="highlight">text</span></code>

... instead of the correct ...

<code>&lt;span class="highlight"&gt;text&lt;/span&gt;</code>

Assuming that there is one or more of these <code> blocks in a feed, how would I go about converting all code inside the <code> tags to their encoded values? (&lt; rather than <)

I've tried the following already:

$output_rss = preg_replace("/<code>(.|\s)*<\/code>/i", "<code>". htmlencode("\\0") ."</code>");

... but it keeps throwing errors. Any ideas? Any working code that I can deconstruct and learn from?

10-10-2004, 05:55 AM
when you want to perform actions on your matches try preg_replace_callback, its actually easier than it may at first seem.

$str = '
<span class="highlight">text innit</span>
<span class="highlight">more text innit</span>

function encode( $regs ){
return str_replace( $regs[1] , htmlspecialchars( $regs[1] ) , $regs[0] ) ;
echo preg_replace_callback( "|<code>(.*)<\/code>|Uis" , 'encode' , $str ) ;

that does not work for nested <code> tags but otherwise its ok

10-10-2004, 06:28 AM
what about...

$the_code = "<code><span class=\"highlight\">some text</span></code> and some more text <code> and some <b>more code</b></code>";

$find = "/<code>(.)*<\/code>/si";

$output_rss = preg_replace_callback($find,"call_back_function",$the_code);

function call_back_function($matches) {
$output = htmlentities($matches[0]);
$temp = eregi_replace("&lt;code&gt;","<code>",$output);
$temp = eregi_replace("&lt;/code&gt;","</code>",$temp);

return $temp;

print $output_rss;



10-10-2004, 06:29 AM
PLEH :p to fp. same thing. :D