W3C home > Mailing lists > Public > www-validator@w3.org > June 2003

Re: default charset broken

From: Karl Ove Hufthammer <karl@huftis.org>
Date: Sat, 07 Jun 2003 17:15:26 +0200
Message-Id: <n2m-g.Xns9393AF8C0AF2Fhuftis@ID-99504.news.dfncis.de>
To: www-validator@w3.org

Terje Bless <link@pobox.com> wrote in
news:f02000001-1026-D90ABEF498F711D7B1DF0030657B83E8@[193.157.66.
23]:

>   Therefore, user agents [MUST NOT] assume any default value
>   for the "charset" parameter.
> ]]] - W3C HTML 4.01 Recommendation 5.2.2
>
> Which puts us in a right pretty pickle.
>
> We've been over this discussion ad nauseum on this list
> several times before. The bottom line is that RFC 2616 and the
> HTML 4.01 Recommendation (and, by extension, XHTML as well[0])
> are incompatible on this point[1]

As I read them, they're not *really* incompatible. I.e. the only
way for a document to be conforming to *both* RFC 2616 and the
HTML 4.01 Rec. is to *always* explicitly send a 'charset'
parameter.

There is one other interpretation, though. RFC 2616 talks about
HTTP clients and HTML 4.01 talks about user agents. When a HTML
document send by HTTP with no explicit 'charset' parameter is
received by a user agent, it's already been through a HTTP client,
and has been given a 'charset' value of 'ISO-8859-1'. Therefore
the paragraph describing what user agents should do when receiving
HTML documents without any 'charset' parameter never applies.

> and the _only_ safe way to
> achieve the correct character encoding for your documents is
> to explicitly specify it in the HTTP «Content-Type» header.

Yes.

> [0] - With the added complication that XHTML superficially is
> meant to obey XML defaulting rules for character encoding (e.g.
>       unlabelled usually means UTF-8).

Except when sent as 'text/xml', where it means 'US-ASCII'. :/

-- 
Karl Ove Hufthammer
Received on Saturday, 7 June 2003 11:15:52 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Wednesday, 25 April 2012 12:14:09 GMT