PDA

View Full Version : encoding type


maes
09-09-2003, 01:57 PM
What encoding type should I use if I want my XML file to contain the path of a file that the user chooses.

I can't control what's in the filename, so it should be able contain all characters that a filename can have (all chars except: \ / | * ? " < > )

Also I can't supstitute characters in the xml-file because other application might use it and they won't know if I changed anything.

the only charachter I encountered that gave a problem is the '&'. Are there any other chars that a filepath can contain and aren't allowed in an xml file?

I tried:
<?xml version="1.0" encoding="windows-1252"?>
<?xml version="1.0" encoding="ISO-8859-1"?>
<?xml version="1.0" encoding="UTF-8"?>
<?xml version="1.0" encoding="UTF-16"?>
are there any other encoding types? (not according w3schools)

I think putting a filepath in an xml-file isn't uncommon, how do people solve this problem :confused:


I also have an other question. I once downloaded an XML validator from Microsoft. It was very handy, you opened an xml-file in IE, right clicked with you mouse and in the popup menu you could choose to validate (offline). Now I can't seem to find that program anymore. I thought it was msxml, downloaded and installed it, But it didn't gave me the options to validate when I right clicked?
Does anyone know what programm that it was?

Thanks

--Maes

liorean
09-09-2003, 05:21 PM
The characters & " / \ | ? < > are not at all disallowed in all file systems. On my mac, I can find files whose names contain among others the following characters ", ?, <, >, /. & is even allowed in fat, fat32, ntfs file systems. The most generally disallowed character, in fact, is :.

What we do, is that we encode them using standard url encoding. This means, for ASCII characters, that you replace them with %nn where nn is a hexadecimal number.

Have a look at <RFC 1738 (http://www.w3.org/Addressing/rfc1738.txt) (URL)>, <RFC 2396 (http://gbiv.com/protocols/uri/rfc/rfc2396.txt) (URI)>.

No, the largest problem with file names and paths is when they happen to either include one of the reserved url characters, one of the SGML/XML control characters, or characters that conflict with the target file system or transfer method. For legal file names or urls including SGML/XML control characters, we simply escape them. " --> &amp;quot;, &amp; --> &amp;amp;, > --> &amp;gt;, < --> &amp;lt;. As for characters that conflict with target file systems, autorenaming (character replacement) or simply avoiding such filenames are the best ways. Oh, and for the transfer method problems or url reserved characters, that's where we use the %nn format.

maes
09-09-2003, 06:16 PM
>>The characters & " / \ | ? < > are not at all disallowed in all file systems
Oops forgot to tell you, I'm making a windows application. But never the less, you made a good point and I'll check for all characters.

I'll escape (or use the hex equavalent) the characters as you said.

Thanks liorean :)

Maes