PDA

View Full Version : é, à, è (etc.) get weird once I add a DOCTYPE.


jjshell
01-22-2005, 10:59 PM
hi...
i'm facing a very strange problem and i would really be interested in listening to what you have to say about it.
i'm working on a website. it's in french. fine. when i display chars like "é è à ü" everything is ok.

once i add a dtd at the top of my page, and when i tell the language used (<html lang="fr" xmlns="http://www.w3.org/1999/xhtml" xml:lang="fr"> ) the chars get all funky.

if i remove the lines that i've just added, the browsers (firefox as well as mozilla) stay high (i mean, they keep on behaving weird), as if the problem was now cached.

the strangest thing is that even when the text is displayed in a form field, it's problematic. i would have expected it to be displayed properly.

if someone has had the same experience, i'm really curious... how did you solve it.

just a little thing: please only post if you think that you can be helpful. no need to point out that i'm a noob or i don't know what. no need to tell me to use google or the search engine. i tried. if you feel like helping me, post, if you feel like being neat picky and sarcastic well... whatever... it's just that i've seen a lot of 'purist rubbish posts' around here it can get annoying sometimes.

:)

jjshell
01-23-2005, 02:03 PM
ok it has to do with this line:



<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />


can use this instead? what are the differnces? what does the charset attribut do?


<meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1" />

gsnedders
01-23-2005, 02:16 PM
The charset attribute tells it what charactors are allowed in the source, and if you want to put those in, you'll be better off with utf-8.

jjshell
01-23-2005, 06:44 PM
with utf-8, how are html entities interpreted?

whackaxe
01-23-2005, 08:31 PM
what about using the ampersand prefixed codes? http://www.w3.org/MarkUp/html-spec/html-spec_13.html

French sucks for the net :(

rmedek
01-24-2005, 01:12 AM
I believe (I could be wrong) that as long as you are serving "text/html" any browser capable of doing so will interpret entities correcty (i.e., &amp; will be &, &lt; will be <, etc.), no matter what the encoding.

I do know that if you are serving your code as xml (application xhtml+xml), most entities will NOT be encoded, and it's best to use UTF-8 to encode the character data for you.

Does that make sense? I just started reading about this here (http://www.456bereastreet.com/archive/200501/the_perils_of_using_xhtml_properly.html) and I'm a little confused myself :D.

jjshell
01-24-2005, 11:53 PM
Do I have to use htmlentities with utf-8? It seems that not, but I'm not sure.

rmedek
01-24-2005, 11:56 PM
Again, I'm not too sure (I'm still messing with this myself), but I believe that whatever is diplayed on your text editor-- if encoded as UTF-8-- will display as such on your browser (if delivered as UTF-8).

This is the part where Liorean or someone chimes in with facts, not just speculation :D:D

Graft-Creative
01-25-2005, 12:21 AM
just a little thing: please only post if you think that you can be helpful. no need to point out that i'm a noob or i don't know what. no need to tell me to use google or the search engine. i tried. if you feel like helping me, post, if you feel like being neat picky and sarcastic well... whatever... it's just that i've seen a lot of 'purist rubbish posts' around here it can get annoying sometimes.
:)

As an aside: I'm a little worried that this community gives out that impression and thus puts people on the defensive. It certainly wasn't my experience when I first started posting - it was people's " bend over backwards to help" attitude that kept me coming back - maybe things have changed?

"The gentler gamester is the soonest winner" to quote shakespeare........as a *purist* myself, I try to keep that in mind.

Gary

liorean
01-27-2005, 09:53 PM
Well, this is where I chime in:

- If you have a meta element specifying 'Content-Type' to be 'text/html; charset=encoding' as http-equiv, make sure that value matches what you are using when editing the document. If you're using what Microsoft likes to call "ANSI", make sure that the encoding is 'iso-8859-1'. If you're using what Microsoft likes to call 'Unicode', make sure that the encoding is 'UTF-16'. If you're using what Microsoft likes to call 'UTF-8' (which is actually the most common Unicode encoding), make sure that the encoding is 'UTF-8'.
- Make sure that the HTTP header 'Content-Type' is sent with the same value. If you're using an Apache server, this is typically done by changes in the .htaccess file.
- If you go with entity references (named characters, eg. &nbsp;) this works the same in all browsers given that you're using the 'text/html' content type. If the browser doesn't support an entity reference, it will lack that support independent of what actual encoding you're sending the file as.
- There's a method with better support though (but less readable). If you use character references (numbered characters as either &#decimalnumber; or &#xhexadecimalnumber;, eg. &# 160; or &#xa0;) you will have full support for all characters in all browsers independent of character encoding used/served as/stated or content type.

In general, the best choice is to make sure your editor/generator is outputting UTF-8, making sure the server is sending as UTF-8 and if you use the meta element to state content type, that should also say UTF-8. UTF-8 is the character encoding of the future. It's suitable for over half the global population, while UTF-16 is the best choice for almost all of the rest.

rmedek
01-28-2005, 07:58 AM
I knew sooner or later Liorean would dig me out of the hole I dug for myself. Thanks for the advice... This one bit is a good point, as well:
- If you go with entity references (named characters, eg. &nbsp this works the same in all browsers given that you're using the 'text/html' content type. If the browser doesn't support an entity reference, it will lack that support independent of what actual encoding you're sending the file as.
- There's a method with better support though (but less readable). If you use character references (numbered characters as either &#decimalnumber; or &#xhexadecimalnumber;, eg. &#xa0 you will have full support for all characters in all browsers independent of character encoding used/served as/stated or content type.
... the same article I mentioned earlier has a good point about that; only the five basic named entities are supported in application xhtml+xml, so it's a good habit to either use the numeric references or depend on your text editor to translate the characters.

As an aside: I'm a little worried that this community gives out that impression and thus puts people on the defensive. It certainly wasn't my experience when I first started posting - it was people's " bend over backwards to help" attitude that kept me coming back - maybe things have changed?

<rant class="mild">I think most of the attitude (I should know, I give it out sometimes :)) comes not from answering a "newbie"'s question, but from dealing with posts that may as well be translated to this:

"Hey, I don't want to take the time to actually learn anything, but I want to have a really killer web site, so could you more experienced guys code it for me for free? Thanx!"

I know that before I posted (about a Dynamic Drive script on my table-based site, no less) I lurked around in the forums a bit, searched the previous posts, read the rules, posted nicely and descriptively, and once I got an answer I tried to follow up accordingly. Not that I'm some angel, but I can see how some people get frustrated when the opposite occurs. If I read a post where someone didn't bother to even be polite about asking for help, they're getting the w3schools link and nothing much else. Especially because I'm no html/css expert; a lot of times I don't know the answer either, and I try to research it a bit so I learn a little, too.

Anyhoo. Just my two cents...
</rant>