PDA

View Full Version : XML entiry list - don't understand?


stfc_boy
11-12-2009, 09:11 AM
Hi,
I have an XML file i'm parsing into a database and i'm getting several characters in the database like so:


& #8232;
& #8232;
& #8195;
& #65533;
& #8486;
& #61630;
& #8729;


Now these are html characters, but I want to get rid of them and replace them with their non-html character. Now someone mentioned creating an XML entiry list which I presume runs before the XML file is read and and replaces these characters. Is that right?

If so can anyone point me in the right direction with some simple tutorials as to how this works or post an example? I've tried googling, but everything i've read does not really answer my question.

Or tell me if i'm completely wrong and there is another way to replace these characters?

Many Thanks
Chris

abduraooft
11-12-2009, 09:26 AM
You could save them as Unicode characters by changing the collation of DB table/field

stfc_boy
11-12-2009, 12:10 PM
Thanks, but i'm trying to understand how it works before trying to fix the issues. Any examples please?

Arbitrator
11-16-2009, 06:19 AM
Now these are html characters, but I want to get rid of them and replace them with their non-html character.What issue you're experiencing isn't clear.

It would help if you used correct terminology though; those aren't "characters". They're decimal-based numeric character references that get converted to characters. They also aren't specific to HTML; you can use them in XML (and generic SGML?) too.

Now someone mentioned creating an XML entiry list which I presume runs before the XML file is read and and replaces these characters. Is that right?You would only use such an ENTITY list if you wanted to create named character references. Below is an example of a named character reference created via an XML document's internal subset:

<!-- Decimal Character 8195 = U+2003 EM SPACE -->
<!DOCTYPE root_element [
<!ENTITY EM_SPACE "&#x2003;">
]>

(I used a hexadecimal-based numeric character reference since this forum will process them as code if they are decimal-based.)

Then you could reference the data with:

Hello&EM_SPACE;World!

I don't think this is what you're trying to do though.

oesxyl
11-16-2009, 06:46 AM
I guess in all languages used in web development exists some ways to convert entities to chars and back. For example in php there is html_enitiy_decode:

http://www.php.net/manual/en/function.html-entity-decode.php

I guess this is what you want. :)

best regards