View Full Version : Parsing issue - unencoded ampersand

01-13-2010, 03:25 PM
Hi all,

I apologize in advance because this is going to be a bit vague at first.

My problem is with validating a page - the trip up is the "unencoded ampersand" - I know how to encode it & amp ; (no spaces) but something odd is happening that is stripping that back to the basic &.

I am using a php news updater (now defunct PHPNews) to insert the code for a php poll script. I think that this updater is the culprit here...I use other updater/blog types of content management and this does not occur.

I go into the admin of the updater and change all instances of & to & amp ; (no spaces) - when I post it the encoding gets stripped and only the & is left. When I return to the edit screen of the updater I see that that is indeed the case - & amp ; (no spaces) has become & again.

What could be going on with the updater script - and could this be modified to leave the text as is? I'm sorry I don't know what kind of file or code bit you would need but will be happy to include it here for reference.

Thank you so much for any advice on this.


01-13-2010, 03:59 PM
Do you have any real issue with this apart from the display in your editor, like errors from validator?

01-13-2010, 04:27 PM
Hi abduraooft,

Yes - that is what is prompting me to try to resolve this. Because the ampersands are not remaining encoded I am getting the following validation errors:

Line 1082, Column 130: general entity "pollid" not defined and no default entity


This is usually a cascading error caused by a an undefined entity reference or use of an unencoded ampersand (&) in an URL or body text. See the previous message for further details.

Like I said - I have tried repeatedly to encode the ampersands but as soon as I hit "post" the updater strips 'em ;/


01-13-2010, 04:32 PM
Sounds like your 'updater' is crap. Try the numeric entity, & #38 ;, instead of & amp ;, and see if it's stripping that too. What is generating that validation error, btw?

01-13-2010, 04:34 PM
If you have access to the code where it echoes the contents to the page, you could add a function htmlentities() there.

01-13-2010, 04:41 PM
Hi Matt,

I just tried & #38 ; (no spaces) and it does the same thing.

I know the updater is really not up to speed on quite a few things and we will be moving to another in the future.

I was just wondering if there was something that could be added or removed in the posting code that would allow for the encoding.

Not sure if this is what you need but I'm posting the "post.php" file here. Maybe you will see something - or I can post another file if needed.


01-13-2010, 04:56 PM
If you have access to the code where it echoes the contents to the page, you could add a function htmlentities() there.

Okay - I will see what I can do with that. Thanks!

01-13-2010, 05:20 PM
Okay - partial succcess! :thumbsup:

I found, in the ListingFunctions.php file this bit:

echo $Contents;

and replaced with:

echo htmlentities($Contents);

I then went back in and edited the post with the ampersands - replacing with

& #38 ; (no spaces)

And it worked!!

But... when I opened the post for edit, then reposted (which I will have to do daily) it stripped them again. :rolleyes:

Do you think there may be a place in the admin section (edit box) that could use the htmlentities() as well?

Thanks so much - I'm glad to both get a little progress and learn a bit at the same time!

01-13-2010, 05:35 PM
Update- I didn't check the other ouput areas - apparently this caused all the associated html to display along with the news... oh well. :)

01-14-2010, 01:52 PM
Instead of htmlentities try using this:

echo str_replace("&","&",$Contents);

It's a blunt tool rather than a precision instrument but it should do the trick well enough.

01-14-2010, 02:00 PM
If htmlenetites didn't do it then this will do just the same, since both will have identical changes to the ampersand to begin with

01-14-2010, 02:04 PM
Yes, but the OP said that it worked in one location but displayed ALL associated HTML. This would just patch up the "&" characters. This at least gets us back to the "partial success" stage. Now the OP would just need to find the other spot that is causing issues.

01-14-2010, 06:03 PM
Hi Rowsdower!

That did in fact fix both the encoding and visible html - thanks!

But the problem of it transposing back to simple "&" with each edit remains.

Bummer, but that's okay. We will just bide our time until I get the new updater in place.

Thanks to all for the replies!