- From: David Brownell <david-b@pacbell.net>
- Date: Wed, 04 Dec 2002 15:34:46 -0800
- To: Terje Bless <link@pobox.com>
- Cc: W3C Validator <www-validator@w3.org>, Karl Dubost <karl@w3.org>
Terje Bless wrote: > > I suspect the problem here was that HTML 4.01 was trying to fix something > that it was not in their purview to fix; namely the poor suitability of > ISO-8859-1 as a default for many web documents. As a default it's pretty good, but a lot of broken systems (browsers and servers both) got shipped that didn't work that way. Such as by letting the default be overridden on servers, and having browsers use charset=. I agree that changing HTTP or MIME were not in that purview. Fixing broken browsers was though; while requiring charset= would have been: > It is highly unfortunate IMO that they chose to do this by overriding HTTP > instead of adding an additional requirement that HTML 4.01 served over HTTP > must explicitly set a character encoding; or by simply punting the issue > back to where it belongs, namely the HTTP specification. Considering that the "HTTP plus HTML" crew was responsible for most of this braindamage in the first place, this is pitiful! MIME has always said the default for "text" is ASCII, but when HTTP was first written up it changed that to "iso-8859-1", mostly to benefit HTML (and, as a side effect, prevent re-use of existing MIME libraries). So HTTP doesn't really do MIME, because of that, and now it seems like HTML won't really do HTTP any more either! > Of course, there is the strong implication that a document that does not > explicitly specify it's encoding is invalid and unparseable, but this is > wholly intentional given that state of character encoding issues. So far the *ONLY* user agent I've ever seen that has any problem parsing that is the current w3c validator. And that's rather new behavior, with only one weak standards leg to stand on (html4). Also ... it says that it tried the Appendix F rules for XML, so either it should NOT do that (effect is to detect encodings like UTF-16 which all standards agree "must" be explicitly labeled) OR it should also be trying the standard HTTP rule like it used to, and like other user agents. It's not even a useful "pedantic mode" default. - Dave
Received on Wednesday, 4 December 2002 18:30:16 UTC