PDA

View Full Version : Multi-Byte Character Problem


Shadar
09-01-2006, 02:20 PM
I am pulling my hair out and I think this is something to do with the DOM, if not please point me at the correct forum to post in.

I have a PHP script that generates a table which can be sorted using Javascript.
When I echo out the php variable into the html I get the correct Japanese characters, when I put it into an array and build the table using the DOM I don't.

Here are some code snippets:

Data:

var jsData = new Array();
jsData[0] = {title:"Another Title", time:"10 分"};


Generating the table cell:

td = document.createElement("td");
txt = document.createTextNode(jsData[i].time);
td.appendChild(txt);
tr.appendChild(td);


Now for some reason instead of the character (分) it outputs the code raw 分.

One more piece of information that may be helpful, when I view source in Firefox it nicely highlights the the numerical part of the code red outside the javascript where it is displayed correctly but doesn't highlight it where it is output raw or inside the script.

Questions:
Do I need to encode multi-byte characters in some way for them to work in Javascript arrays?
Do I have to tell createTextNode() that I am giving it something that may contain multi-byte characters?

Any help muchly appreciated,

Shadar

liorean
09-01-2006, 02:50 PM
Okay, this answer is purely based on assumptions, so I might be dead wrong. But anyways:

My guess is that the php outputs not the actual character, but an HTML escape. This HTML escape, in fact: 分. Now, that escape should really look like this 分 to be proper. But anyways, that wasn't the problem, just my assumptions.

JavaScript is not HTML. JavaScript has it's own escaping mechanism, which looks like this: \uxxxx where xxxx is a two byte USC-2 Unicode code point in hexadecimal format. 20998 in decimal is 5206 in hexadecimal. So, make it output a proper JavaScript escape instead of an HTML escape and see if that fixes it.


Another possible solution is to include the character in raw UTF-8 format, and changing your HTML and JavaScript documents to use that charset in the HTTP headers. (You need to change both documents, because some browsers use the HTML charset also for JavaScript documents even when those documents have been sent using a different charset.)

Shadar
09-01-2006, 04:27 PM
Thanks for the quick reply,

You seem to be right in your assumption, which is nice.

I have managed to cobble together a function that takes the ascii and returns the utf format with much help from google & beer. The only problem being that I have to force the browser onto utf-8 character set. (I can do that through the DOM can't I?)

I'll have a bash at converting that to a 2byte unicode converter and see how that goes, would be a better way of doing it. I found one in Javascript over here (http://www.hot-tips.co.uk/ (http://www.hot-tips.co.uk/useful/unicode_converter.HTML))
but since the solution to my problem looks like it is going to be in PHP rather than JS I'll shut up on this thread.

Thanks again,

Shadar...