Re: default charset broken from Kjetil Torgrim Homme on 2003-06-07 (www-validator@w3.org from June 2003)

From: Kjetil Torgrim Homme <kjetilho@ifi.uio.no>
Date: Sat, 07 Jun 2003 19:34:21 +0200
To: W3C Validator <www-validator@w3.org>
Message-ID: <1ry90dkd0i.fsf@vingodur.ifi.uio.no>

[Terje Bless]:
>
>   Kjetil Torgrim Homme <kjetilho@ifi.uio.no> wrote:
>   
>   > also note that this paragraph wasn't in the original HTTP/1.1
>   > RFC, and the text in 5.2.2 has not changed since HTML 4.0 of
>   > December 1997.
>   
>   I see no relevance to this other then to support the view that the
>   HTTP WG also meant for charset to be explicitly specified unless
>   there was some specific and overweighing reason not to
>   (i.e. «SHOULD»).

the relevance was that the HTML spec ignored the text of RFC 2068,
which is even stronger than RFC 2616.

>   > furthermore, configuring Apache to set include
>   > charset=iso-8859-1 for all files of type text/html will make it
>   > impossible for a document to use a different charset since it
>   > overrides META HTTP-EQUIV.  (another poor choice in the HTML
>   > Recommendation, IMHO).
>   
>   Nonsense. In Apache you would use AddDefaultEncoding for
>   iso-8859-1 and use Content-Negotiation to select between
>   e.g. index.html.utf-8 and index.html.iso-8859-1 (or between
>   "index.html.utf-8" and "" ;D).

my point stands, META can no longer be used.  but this is not
important.

>   Defaulting to UTF-8 is intended to be the least-wrong error
>   recovery procedure (given its inclusiveness and wide applicability
>   in non-european/north-american contexts), but the result can never
>   say authoratively that the page is valid or invalid since we
>   didn't have enough information to reliably validate it (i.e. the
>   result is guesswork).

thank you for the explanation, I don't object to that behaviour.

-- 
Kjetil T.

Received on Saturday, 7 June 2003 13:34:25 UTC