- From: Frank Ellermann <nobody@xyzzy.claranet.de>
- Date: Fri, 28 Dec 2007 03:26:46 +0100
- To: ietf-types@alvestrand.no
- Cc: ietf-http-wg@w3.org
Martin Duerst wrote: > The new version of the HTTP spec, 2616bis, should definitely > drop the iso-8859-1 default, but NOT in favor of "unknown > text is ASCII". It should just say that there is no default. A MIME entity with "default ASCII" using any 1xxx xxxx octets is erroneous. With "default ASCII" 2616bis would be consistent with MIME, that's good. We have no "unknown-7bit" charset for unidentified "ASCII compatible" encodings (for octets 0..127), and the "default ASCII" is an emulation for such dubious cases, same idea as in mail. Years later (after 2616bis) it might be possible to upgrade "default ASCII" to UTF-8, Latin-1 was a dead end. As soon as we're back to "default ASCII" just let RFC 2277 finish it off. > There is a big difference between these two, especially for > document formats that contain internal 'charset' information. > A default of US-ASCII makes document-internal 'charset' > information useless (because the external information wins). Right, that must not happen, IMO a "default" is an assumption if no better info is available. For HTTP it also limits what can be used in *headers* (no message/rfc822 vs. message/global abstractions necessary, HTTP isn't UTF8SMTP) The *body* contains octets, only 0..127 can be interpreted as ASCII, anything else needs an explicit declaration somewhere - "internal" would be fine for many users who can't change the "external" declaration. That's actually the same issue as it is today with an external "default Latin-1", the internal UTF-8 / KOI8-R / windows-1252 (etc.) declaration wins if there is no explicit statement from the server. Otherwise my non-ASCII Web pages won't validate, but they do. > One reason for the problems with text/xml was that the > original MIME default of US-ASCII was enforced. This made > it impossible to serve XML documents with internal 'charset' > information only as text/xml. The odd text/xml case is different, there's a MUST somewhere in the text/xml spec. But nobody treats text/html as "default Latin-1" ignoring the internal declaration. The W3C validator even enforces its very own UTF-8 default for HTML 2, where it really should be Latin-1, maybe we could report this as bug :-) Frank
Received on Saturday, 12 January 2008 18:01:29 UTC