RE: encoding in XHTML

Sorry, I stand corrected...
For "HTTP", the default encoding is ISO-8859-1.  There
is no default for "HTML".  User agents are free to make
their best guess.

-Paul

-----Original Message-----
From: Paul Deuter 
Sent: Monday, November 11, 2002 9:50 PM
To: 'Chris Lilley'; www-international@w3.org; Peter_Constable@sil.org
Subject: RE: encoding in XHTML


The HTML default charset *is* 8859-1 but IE will
render octets in the range 0x80-0x9F as Windows-1252.
So I guess that says that for IE: the default is 1252.

-Paul

-----Original Message-----
From: Chris Lilley [mailto:chris@w3.org]
Sent: Monday, November 11, 2002 6:52 PM
To: www-international@w3.org; Peter_Constable@sil.org
Subject: Re: encoding in XHTML



On Sunday, November 3, 2002, 1:34:35 PM, Peter wrote:


Pso> This is said to be informative, yet the quoted text says, "...a
Pso> document that wants to set its character encoding explicitly
Pso> *must* include both the XML declaration an encoding declaration
Pso> and a meta http-equiv statement..." (emphasis added). How can an
Pso> informative portion of the document say that something *must* be
Pso> done?

Good point. (Sorry for the late response, I was in a week-long face to
face meeting so email got behind).

Pso> The bigger question is what really should or does happen. This
Pso> issue was brought to my attention when I discovered that IE 6
Pso> would not interpret a certain xhtml doc in terms of UTF-8 unless
Pso> we added the http-equiv statement, even though UTF-8 was
Pso> explicitly declared as the encoding in the XML declaration.

Which tells you that it is ignoring any XML and treating it as
'traditional HTML' tag soup. You could make the document not well
formed and IE would not behave any differently.

Pso> (It was assuming either 8859-1 or cp1252, I forget which.)

What was the server saying? The default for text/html without an
explicit charset sent over HTTP is 8859-1, no?

Pso> It seems to me that this was a bug on the part of IE -- if it's
Pso> interpreting an XML doc, it should pay attention to the encoding
Pso> declared in the XML declaration.

Yes; big emphasis on the *if its interpreting*.

Pso> In general, it seems to me that stronger statements should be
Pso> made in the spec: XHTML is an XML application, and thus user
Pso> agents must conform to the XML spec, implying that an encoding
Pso> specified in the XML declaration *must* be observed -- and that
Pso> this statement can be made normatively rather than just
Pso> informatively. Am I missing something?

Yes. XHTML 1.x (unadvisedly in my view) tried to pretend that
browsers could move to XML gradually and that the same content could
be served up to HTML and XHTML user agents, with the same mime type.
This resulted in a bunch of new sniffing technologies being developed
- different in each browser of course - and 60% of all XHTML documents
not even being well formed.

Hence the belt and braces approach of saying things twice.


Pso> Or is this being worked on further in the draft for version 2?

Version 2 is supposed to be really XML. Bugwards compatibility is
supposed to be no longer an issue. It uses a different mime type,
application/xhtml+xml, so a single browser can do both the buggy old
stuff and the shiny new stuff without sniffing.

Thus, XHTML 2.0 should not need any meta charset stuff. Just the XML
encoding declaration.


-- 
 Chris                            mailto:chris@w3.org

Received on Tuesday, 12 November 2002 12:35:31 UTC