- From: Paul Deuter <Paul.Deuter@plumtree.com>
- Date: Tue, 12 Nov 2002 09:36:30 -0800
- To: "Chris Lilley" <chris@w3.org>, <www-international@w3.org>, <Peter_Constable@sil.org>
Sorry, I stand corrected... For "HTTP", the default encoding is ISO-8859-1. There is no default for "HTML". User agents are free to make their best guess. -Paul -----Original Message----- From: Paul Deuter Sent: Monday, November 11, 2002 9:50 PM To: 'Chris Lilley'; www-international@w3.org; Peter_Constable@sil.org Subject: RE: encoding in XHTML The HTML default charset *is* 8859-1 but IE will render octets in the range 0x80-0x9F as Windows-1252. So I guess that says that for IE: the default is 1252. -Paul -----Original Message----- From: Chris Lilley [mailto:chris@w3.org] Sent: Monday, November 11, 2002 6:52 PM To: www-international@w3.org; Peter_Constable@sil.org Subject: Re: encoding in XHTML On Sunday, November 3, 2002, 1:34:35 PM, Peter wrote: Pso> This is said to be informative, yet the quoted text says, "...a Pso> document that wants to set its character encoding explicitly Pso> *must* include both the XML declaration an encoding declaration Pso> and a meta http-equiv statement..." (emphasis added). How can an Pso> informative portion of the document say that something *must* be Pso> done? Good point. (Sorry for the late response, I was in a week-long face to face meeting so email got behind). Pso> The bigger question is what really should or does happen. This Pso> issue was brought to my attention when I discovered that IE 6 Pso> would not interpret a certain xhtml doc in terms of UTF-8 unless Pso> we added the http-equiv statement, even though UTF-8 was Pso> explicitly declared as the encoding in the XML declaration. Which tells you that it is ignoring any XML and treating it as 'traditional HTML' tag soup. You could make the document not well formed and IE would not behave any differently. Pso> (It was assuming either 8859-1 or cp1252, I forget which.) What was the server saying? The default for text/html without an explicit charset sent over HTTP is 8859-1, no? Pso> It seems to me that this was a bug on the part of IE -- if it's Pso> interpreting an XML doc, it should pay attention to the encoding Pso> declared in the XML declaration. Yes; big emphasis on the *if its interpreting*. Pso> In general, it seems to me that stronger statements should be Pso> made in the spec: XHTML is an XML application, and thus user Pso> agents must conform to the XML spec, implying that an encoding Pso> specified in the XML declaration *must* be observed -- and that Pso> this statement can be made normatively rather than just Pso> informatively. Am I missing something? Yes. XHTML 1.x (unadvisedly in my view) tried to pretend that browsers could move to XML gradually and that the same content could be served up to HTML and XHTML user agents, with the same mime type. This resulted in a bunch of new sniffing technologies being developed - different in each browser of course - and 60% of all XHTML documents not even being well formed. Hence the belt and braces approach of saying things twice. Pso> Or is this being worked on further in the draft for version 2? Version 2 is supposed to be really XML. Bugwards compatibility is supposed to be no longer an issue. It uses a different mime type, application/xhtml+xml, so a single browser can do both the buggy old stuff and the shiny new stuff without sniffing. Thus, XHTML 2.0 should not need any meta charset stuff. Just the XML encoding declaration. -- Chris mailto:chris@w3.org
Received on Tuesday, 12 November 2002 12:35:31 UTC