- From: Frank Ellermann <nobody@xyzzy.claranet.de>
- Date: Tue, 12 Feb 2008 23:19:16 +0100
- To: ietf-http-wg@w3.org
Roy T. Fielding wrote: > Servers, OTOH, send text/* content with the assumption that it will be > treated as iso-8859-1 (or at least some safe superset of US-ASCII). That could be US-ASCII itself, windows-1252, UTF-8, and a bunch of similar windows-xxxx, iso-8859-x, or other "unknown-ascii" supersets (not counting UTF-1, UTF-7, or weirder charsets for obvious reasons). > None of these implementations assume that a missing charset means > US-ASCII. We cannot "pass the buck" to MIME because we are still > not MIME-compliant and never will be (see Content-Encoding). All sound ASCII supersets have one thing in common, the 128 US-ASCII octets, meaning U+0000 up to U+007F. > iso-8859-1 is still the most interoperable default *with* the > addition of safe sniffing only when the charset is left unlabeled > or when charset="iso-8859-1". Any of these US-ASCII supersets could do as default. The problems of an explicit iso-8859-1 where that is not true cannot get worse, and unfortunately also not better, with another default. > In other words, it is safe to sniff for charsets in the first ten > or so characters, and also to switch to other US-ASCII supersets > after reading something like the <meta http-equiv="content-type" An US-ASCII, windows-1252, or UTF-8 default would not change that. And US-ASCII is the best approximation of "unknown-ascii" we have at the moment. If you decide that it's not good enough the "charset list" already discussed to register "unknown-ascii" in addition to the existing "unknown-8bit" some months ago. And we could make sure that this default pseudo-charset by definition won't cover UTF-1, UTF-7, or similar abominations. But I think an US-ASCII default does precisely what you want, and I fail to see how this could break existing HTTP implementations. Frank
Received on Tuesday, 12 February 2008 22:18:22 UTC