DavidRTG
11-17-2004, 05:19 PM
I'm new here and i'm not sure if this problem can be solved on this forum or not. I'm setting up and RSS feed but i'm having problems with the text that is being submitted. The text is coming from a word processor and its not valid XML.
Here is the link to the Feed Validator:
http://feedvalidator.org/check.cgi?url=http%3A%2F%2Fwww.texasauctionguide.com%2FRSS%2FRSSTexasFeed.xml
I've tried using <?xml version="1.0" encoding="iso-8859-1"?> and <?xml version="1.0" encoding="utf-8"?>. As you can see from the validator, iso-8859-1 converts the problem text to & #150; characters. When using utf-8 they are question marks.
I'm using Perl to create the RSS feed so I have the option of encoding the text before it is printed to the file. Is there any way to get around these invalid characters?
Thanks,
David
Skyzyx
11-17-2004, 07:45 PM
ASCII: This set contains a-z, A-Z, numbers, and basic symbols. This set begins at zero and goes up through #127;
ISO-8859-1: Once we realized we need more characters than what ASCII provided, several "extended" character sets emerged. The one called "ASCII Extended" lives in MS-DOS. Windows, Mac Classic, Mac OS X, Linux, Unix, etc., all support the International Standardization Organization's character set for Latin-based languages (of which English is one). ISO-8859-1 is the extended character set is supported cross-platform, and is what we like to use for RSS/XML. This set begins at #161; and goes through #255;
It should be noted that #128; through #160; is an extended character "black hole". There are NO valid ASCII or ISO-8859-1 characters that exist there. Why? I don't know, but that's just how it is. This includes the popular trademark (#153;), emdash (#151;), and ellipsis (#133;). These will generally not render properly in XML because they are Windows-only characters, part of the Windows-1252 character set.
windows-1252-1: This is Microsoft's favorite character set. Instead of following the standard, they made their own with a mix of ASCII, Unicode, and ISO-8859-1 characters. This is where the popular trademark (#153;), emdash (#151;), and ellipsis (#133;) characters come from. These don't work anywhere but in Windows. Now, I know that most of the world uses Windows, but those of us who use Mac, Linux, Unix, or BeOS boxes will get the (correct) ISO-8859-1 characters -- which look nothing like how windows-1252-1 displays them, which will appear to be wrong. Instead of using trademark and other characters from this character set, we should use the Unicode entities.
So, there are two solutions here. Either (A) save and serve your feed as Windows-1252, or convert every character between 128-160 into it's UTF-8 value -- which is actually the more compatible solution.
DavidRTG
11-17-2004, 08:08 PM
Thanks for the help and information Skyzyx! Changed it to <?xml version="1.0" encoding="windows-1252"?> and the feed validates and works fine.
vBulletin® v3.8.2, Copyright ©2000-2012, Jelsoft Enterprises Ltd.