PDA

View Full Version : Character codes and bits


krycek
11-16-2002, 06:27 PM
Ok, I am currently having to use character codes quite heavily. I am not sure, however, as to the range of codes supported...?

I know standard ASCII sets etc go from 0 - 127, and some to 255, but that's only 8-bit. I also know that Unicode is the latest standard etc. etc. but how many bits is it based on? My system seems to cope with 32-bit character codes with no problem (4 billion combinations, of course not enough characters though) however I am not sure what is going on behind the scenes.

If I code a character such as, String.fromCharCode(1749827), is that actually using that character, or is something else going on? Plus, how is that stored/transferred? Does it stay as a 32-bit byte, or is it re-represented as a double-byte word, or what?

I need to know the maximum number of bits I can use when addressing these characters, so that I can safely transfer information between various servers and clients. I am not worried about really old systems, but I would like this to work on Win95+, with IE4+, and as many server technologies as possible. Oh, and does HTTP support this?

I can't find this info anywhere on the net so I hope someone here knows!

...plus another question... what is the maximum addressable bit-length of a string in JavaScript? i.e. are JS strings 32-bit (4.2billion chars max length) or are they longer shorter or what.

It is important to me that the method I use works with PHP too, so if you could take that into account I would be most grateful!

::] krycek [::

joh6nn
11-16-2002, 06:43 PM
Unicode is 16 bit, and to the best of my knowledge, there are no restrictions on the length of a string, outside of what your machine can handle. if you have to have a definite answer, check out the NS DevEdge and MSDN Lib, for the respective implementations.

my thoughts on this: String.fromCharCode(1749827); are that you'll either get back a null string, or gibberish, because Unicode only supports 16 bits ( and that's 2 bytes, btw, not a 16-bit byte ), and Unicode is far from being full to that point. i think.

krycek
11-16-2002, 07:15 PM
16-bit = not too bad :) not so many as i would like, but 65,536 values is ok I guess. Better than 256, anyway!

I am not intending to display the characters, rather, I am using them in a compression routine.

Thanks for the help! :D

::] krycek [::