W3C home > Mailing lists > Public > www-validator@w3.org > June 2003

Re: default charset broken

From: Terje Bless <link@pobox.com>
Date: Sat, 7 Jun 2003 18:00:51 +0200
To: W3C Validator <www-validator@w3.org>
Message-ID: <f02000001-1026-364C400E990111D7B1DF0030657B83E8@[193.157.66.23]>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Karl Ove Hufthammer <karl@huftis.org> wrote:

>There is one other interpretation, though. RFC 2616 talks about HTTP
>clients and HTML 4.01 talks about user agents. When a HTML document send
>by HTTP with no explicit 'charset' parameter is received by a user
>agent, it's already been through a HTTP client, and has been given a
>'charset' value of 'ISO-8859-1'. Therefore the paragraph describing what
>user agents should do when receiving HTML documents without any
>'charset' parameter never applies.

Yes, well, in the interest of full disclosure, let me add that another
significant factor in the Validator's current behaviour is that the HTTP
defaulting behaviour is considered harmfull to i18n and all those users for
whom iso-8859-1 is insufficient.

I'm not saying this is an entirely uncontroversial position (and I think Björn
among others have pointed this out on several occasions), but it is one factor
that affects which way we've elected to go in the absence of persuasive
specification language.


In particular, if we allow for your interpretation above, we would in effect
default to ISO-8859-1 not only for pages such as Kjetil's (who are most
certainly correct and the author very aware of what he is doing), but also for
Joe Web-duh-signer and his clueless little hosting company where there is _no_
conscious decision involved and ISO-8859-1 is the _wrong_ value more often
then not.


There is also the open question of HTTP's relationship with MIME, whose rules
would indicate that text/html without a charset parameter ought to be
interpreted as US-ASCII.

Which taken together with the above and previous issues all lead up to the
single conclusion that «Charset Defaulting Considered Harmfull» (to invoke a
modern Godwin-equivalent ;D) and that the only reliable way -- and therefore
the behaviour the Validator should be aiming to encourage and enforce -- to
deal with character encoding issues is to label them explicitly.

- -- 
I have lobbied for the update and improvement of SGML. I've done it for years.
I consider it the jewel for which XML is a setting.  It does deserve a bit or
polishing now and then.                                        -- Len Bullard

-----BEGIN PGP SIGNATURE-----
Version: PGP SDK 3.0.2

iQA/AwUBPuIMMqPyPrIkdfXsEQIhWgCg3hy2O5gifcpVNI08OzqT5KeB/jMAnRXj
lo1ZO96Vg+MafBRIi25x+bqj
=ij/A
-----END PGP SIGNATURE-----
Received on Saturday, 7 June 2003 12:00:54 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Wednesday, 25 April 2012 12:14:09 GMT