W3C home > Mailing lists > Public > www-international@w3.org > October to December 2002

RE: encoding in XHTML

From: Paul Deuter <Paul.Deuter@plumtree.com>
Date: Mon, 11 Nov 2002 21:49:58 -0800
Message-ID: <C7F00D7948B8E4468BB330152C6BA4E003F0C905@cstaex03.USIPLUMTREE.AD>
To: "Chris Lilley" <chris@w3.org>, <www-international@w3.org>, <Peter_Constable@sil.org>

The HTML default charset *is* 8859-1 but IE will
render octets in the range 0x80-0x9F as Windows-1252.
So I guess that says that for IE: the default is 1252.

-Paul

-----Original Message-----
From: Chris Lilley [mailto:chris@w3.org]
Sent: Monday, November 11, 2002 6:52 PM
To: www-international@w3.org; Peter_Constable@sil.org
Subject: Re: encoding in XHTML



On Sunday, November 3, 2002, 1:34:35 PM, Peter wrote:


Pso> This is said to be informative, yet the quoted text says, "...a
Pso> document that wants to set its character encoding explicitly
Pso> *must* include both the XML declaration an encoding declaration
Pso> and a meta http-equiv statement..." (emphasis added). How can an
Pso> informative portion of the document say that something *must* be
Pso> done?

Good point. (Sorry for the late response, I was in a week-long face to
face meeting so email got behind).

Pso> The bigger question is what really should or does happen. This
Pso> issue was brought to my attention when I discovered that IE 6
Pso> would not interpret a certain xhtml doc in terms of UTF-8 unless
Pso> we added the http-equiv statement, even though UTF-8 was
Pso> explicitly declared as the encoding in the XML declaration.

Which tells you that it is ignoring any XML and treating it as
'traditional HTML' tag soup. You could make the document not well
formed and IE would not behave any differently.

Pso> (It was assuming either 8859-1 or cp1252, I forget which.)

What was the server saying? The default for text/html without an
explicit charset sent over HTTP is 8859-1, no?

Pso> It seems to me that this was a bug on the part of IE -- if it's
Pso> interpreting an XML doc, it should pay attention to the encoding
Pso> declared in the XML declaration.

Yes; big emphasis on the *if its interpreting*.

Pso> In general, it seems to me that stronger statements should be
Pso> made in the spec: XHTML is an XML application, and thus user
Pso> agents must conform to the XML spec, implying that an encoding
Pso> specified in the XML declaration *must* be observed -- and that
Pso> this statement can be made normatively rather than just
Pso> informatively. Am I missing something?

Yes. XHTML 1.x (unadvisedly in my view) tried to pretend that
browsers could move to XML gradually and that the same content could
be served up to HTML and XHTML user agents, with the same mime type.
This resulted in a bunch of new sniffing technologies being developed
- different in each browser of course - and 60% of all XHTML documents
not even being well formed.

Hence the belt and braces approach of saying things twice.


Pso> Or is this being worked on further in the draft for version 2?

Version 2 is supposed to be really XML. Bugwards compatibility is
supposed to be no longer an issue. It uses a different mime type,
application/xhtml+xml, so a single browser can do both the buggy old
stuff and the shiny new stuff without sniffing.

Thus, XHTML 2.0 should not need any meta charset stuff. Just the XML
encoding declaration.


-- 
 Chris                            mailto:chris@w3.org
Received on Tuesday, 12 November 2002 00:48:54 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 2 June 2009 19:16:59 GMT