PDA

View Full Version : How can I make browser to recognize encoding of my xhtml pages?



Zagor
Nov 24th, 2006, 02:19 AM
Hello everyone,

I seem to have a little problem. Currently I'm working on localization of one web site from English to Serbian language. For Serbian I use windows-1250 charset (utf-8 can be used too). But I'm having problems displaying some specific characters in the browser. Here is how my header looks:

<?xml version="1.0" encoding="windows-1250"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="sr" lang="sr">
<head>
<meta http-equiv="content-type" content="text/html; charset=windows-1250" />
My files are saved as ascii in my code editor.
The Strangest thing is that files are displayed properly when viewed locally on my hard drive, but as soon as I upload them on my server and try to view them in the browser (any browser), browser displays the page using standard ISO-8859-1 charset.

How can I make browser to recognize encoding of my xhtml pages?

Any ideas?

BonRouge
Nov 24th, 2006, 07:56 AM
You might need to change the way you save it. Get EmEditor ("http://www.emeditor.com/modules/download2/). It can save files with all sorts of different kinds of encoding.

dAEk
Nov 24th, 2006, 03:29 PM
Have you set the http headers on the server side?

There's really no need to set the charset by using the meta-element, only if opening the page from disk, locally. The meta element is ignored if the http header is set.

Zagor
Nov 24th, 2006, 08:07 PM
Thank you for the tip. I completely forgot that server can send his charset header overriding one in the markup.

I'll look into it in couple of days when I upload site to a new server because on current server I don't have that permissions.

Thx BonRouge for EmEditor. It's sad really that UltraEdit, the most powerful txt editor doesn't support that many encodings.

dAEk
Nov 24th, 2006, 11:43 PM
Somehow I forgot to mention that you also need to save the documents with the appropiate encoding. I noticed you got help at another forum but I think it needs to mentioned here too.

May I ask why you don't save your documents with utf8 encoding? Most editors have support for it and besides, it gives you more flexibility in the long run.

mrdantownsend
Nov 25th, 2006, 12:17 AM
<meta http-equiv="Content-Type" content="text/html; charset=windows-1250" />

enter that in the head part of your document

Zagor
Nov 25th, 2006, 12:25 AM
As you can see I said in my first post that regardless the encoding I set (windows-1250 or utf-8) browser uses the default one. This is, as it turn out to be the same difference for my issue, but you are definitely right about utf-8.

There was a moment when I was deciding whit what encoding to go with and I simply don't know why I picked 1250 over utf-8. But it is excellent that someone here was good enough to remind me. I think that forum beside learning serves as a constant standards reminder for designers not to stroll in a bad place.

Thanks Again.

Arbitrator
Nov 25th, 2006, 06:54 AM
As you can see I said in my first post that regardless the encoding I set (windows-1250 or utf-8) browser uses the default one.I’m guessing “Windows” here refers to Microsoft Windows, the operating system, so that would be one reason not to use it as it isn’t neutral because not everyone uses Windows. There are Linux, Macintosh, and other users to consider.

UTF‐8 will work for just about anything. To display your page with that particular encoding, you need to do three things:

Save the file as a UTF‐8–encoded file. Make sure that you don’t save it with a byte-order mark (as Microsoft Notepad forces you to do, for example) if you intend to use BOM‐incompatible technologies such as PHP or have your document viewed by older browsers.
Add a meta element or XML declaration that identifies the document as UTF‐8–encoded to a browser.
Configure your server to display the file as UTF‐8–encoded. Server settings are more important than the previous method of declaring the encoding and will override them. The document should still have its encoding declared using the previous methods, however, for various reasons: automated validation (W3C Validator) and so that the correct encoding is still used when the document is separated from the server (for example, for local testing).

dAEk
Nov 25th, 2006, 09:51 AM
#2 is not a need per se. W3C recommends that you use <?xml version="1.0" encoding="utf-8"?> but it's not really necessary. By default, XML parsers will use utf8 if the XML declaration is missing. It should only be used when serving pages as true xhtml, ie not text/html, because if you send documnets with the XML declaration and as text/html, IE goes into quirks mode. And that's not always what you would want.

Arbitrator
Nov 25th, 2006, 12:47 PM
By default, XML parsers will use utf8 if the XML declaration is missing. It should only be used when serving pages as true xhtml, ie not text/html, because if you send documnets with the XML declaration and as text/html, IE goes into quirks mode. And that's not always what you would want.Yes, but I’m guessing that this person, like everyone else out there, will be serving their XHTML as HTML, in which case UTF‐8 is not the assumed default since an XML parser won’t be used. Then again, it might be for browsers that understand true XHTML; Internet Explorer, however, is not among them. Thus, it’s still a good idea to use at least a meta element as per #2. And if you display your XHTML document as encoded by anything other than UTF, an XML declaration is no longer optional as per Appendix C (http://www.w3.org/TR/xhtml1/#C_1) of the XHTML 1 specification.

Note also that Internet Explorer 7 now recognizes the XML declaration when used with XHTML and so does not enter backward‐compatibility (quirks) mode when such a declaration is used. If Internet Explorer 6 and prior are not issues, then an XML declaration may be used freely without worrying about the problems caused by the quirks mode renderer.

I guess I should also clarify point #3 too: the XML declaration takes precedence over all else when the page is rendered as true XHTML.

gsnedders
Nov 25th, 2006, 12:50 PM
Remember that the Content-Type HTTP/1.1 header will take priority over pretty much anything else.

gsnedders
Nov 25th, 2006, 01:01 PM
I guess I should also clarify point #3 now; the XML declaration takes precedence over all else when the page is rendered as true XHTML.

(The following assumes the data is being transmitted over HTTP)

Not under RFC 3023 (XML MIME types): application/xml (or subtypes) use:
HTTP/1.1 Content-Type
XML prolog
BOM
UTF-8
text/xml (or subtypes) use:
HTTP/1.1 Content-Type
BOM
US-ASCII
If we have any other text/* group MIME type, then we fall back to RFC 2616 (HTTP/1.1):
HTTP/1.1 Content-Type
ISO-8859-1

Arbitrator
Nov 25th, 2006, 01:56 PM
Hmm… You seem to be right since the the paragraph in XHTML 1 (C)(9) (http://www.w3.org/TR/xhtml1/#C_9) is a bit ambiguous and assuming that this is the standard for application/xhtml+xml:

8.20 INCONSISTENT EXAMPLE: Text/xml with UTF-8 Charset

Content-type: text/xml; charset="utf-8"

<?xml version="1.0" encoding="iso-8859-1"?>

Since the charset parameter is provided in the Content-Type header, MIME and XML processors MUST treat the enclosed entity as UTF-8 encoded. That is, the "iso-8859-1" encoding MUST be ignored.

Processors generating XML MIME entities MUST NOT label conflicting charset information between the MIME Content-Type and the XML declaration.

RFC 3023 seems to confirm that you need an XML declaration if the character set is anything other than UTF:

Unless the charset is UTF-8 or UTF-16, the recipient SHOULD also persistently store information about the charset, perhaps by embedding a correct XML encoding declaration within the XML MIME entity.

And that the encoding is not assumed to be UTF‐8 when XHTML is displayed as HTML:

However, MIME processors that are not XML processors SHOULD NOT assume a default charset if the charset parameter is omitted from an application/xml entity.

gsnedders
Nov 25th, 2006, 02:57 PM
Hmm… You seem to be right since the the paragraph in XHTML 1 (C)(9) (http://www.w3.org/TR/xhtml1/#C_9) is a bit ambiguous and assuming that this is the standard for application/xhtml+xml
If you actually read appendix C section three (which is informative, and not part of the normative specification) it's quite clear it's just extending the XML standard with the <meta> element.


RFC 3023 seems to confirm that you need an XML declaration if the character set is anything other than UTF
Again, even reading just the quotation:
the recipient SHOULD also persistently store information about the charset
The recipient, not the publisher; and SHOULD (under the RFC 2119 definition), not need.


And that the encoding is not assumed to be UTF-8 when XHTML is displayed as HTML… and combing RFC 2616 and RFC 3023 means you should be issuing a warning (under the RFC 2616 definition) if you send:
Content-Type: application/xml
This therefore doesn't apply to XHTML served as HTML, as by that very statement it is served as text/html and not application/xml.