- From: Frank Ellermann <nobody@xyzzy.claranet.de>
- Date: Wed, 26 Mar 2008 18:04:59 +0100
- To: ietf-http-wg@w3.org
Martin Dürst wrote: > [I was a co-author, but that was a long time ago.] It's still interesting for reconstructions how precisely the Web and the Internet at large ended up with Unicode, later UTF-8, now net-utf8 (in essence NFC). My private theory how this all happened is that Harald considered ISO 2022 as a hopeless case after seriously trying to make it work, and you consider anything where it is not intuitively possible to use "ü" as broken by design. [old browsers hating charset parameters] > My understanding is that this problem was corrected in > version 3 or so of Netscape and IE, or anyway in a > timeframe that makes in irrelevant for our current > spec. +1 IIRC IBM Webexplorer still had issues with it, but HTTP/1.0 browsers not supporting Host: header fields are irrelevant today. And we are not updating gopher type h, simple HTTP (0.9), or similar relics. >| In the case where a document is accessed from a >| hyperlink in an origin HTML document, a CHARSET >| attribute is added to the attribute list of >| elements with link semantics (A and LINK) [...] > [not sure how much this is implemented or in use; it's > not directly a HTTP issue] Yes, no HTTP issue. I use charset="PC-Multilingual-850+euro" in some links, but I'm not aware of a spider or browser doing anything with this info. Which does not mean that it is necessarily a waste of time - I'm also not aware of UAs looking at say hreflang=, but I could add some CSS magic for it later. >| <META HTTP-EQUIV="Content-Type" >| CONTENT="text/html; charset=ISO-2022-JP"> >| >| This is not foolproof, but will work if the encoding >| scheme is such that ASCII-valued octets stand for >| ASCII characters only at least until the META element >| is parsed. > [This is very, very widely used. As far as it's HTML, > it's nothing HTTP should be concerned, but it is highly > relevant for HTTP because it is dead straight against > any default on the charset parameter in HTTP.] Wait a moment, it is dead straight against any default that is *NOT* ASCII, or rather against a default not containing ASCII as proper subset. For the [i20] question it only tells us that we cannot pick say BOCU-1 as new default, even if that's MIME compatible. Arguably it also tells us that the "default" does not mean much for HTTP. It is interesting for HTTP header fields. For the text/* [i20] issue we might be free to pick ASCII instead of Latin-1 if that's better for MIME compatibility, especially for text/plain, naturally for text/xml, and no problem for text/html. >| see [NICOL2] for some details and a proposal. What was NICOL2, was that your heuristic to "sniff" UTF-8 ? The main problem I have with the "Latin-1 default" is that it blocks a future "UTF-8 default" (talking about HTTP/1.1) Frank
Received on Wednesday, 26 March 2008 17:03:27 UTC