Re: default charset broken

From: Kjetil Torgrim Homme <kjetilho@ifi.uio.no>
Date: Sat, 07 Jun 2003 18:02:03 +0200
To: W3C Validator <www-validator@w3.org>
Message-ID: <1rbrx9lvus.fsf@vingodur.ifi.uio.no>

[Terje Bless]:
>   Uhm, what a wonderfully confrontationally phrased bug report;
>   you're usually much more carefull with your formulation in no.*
>   Kjetil! :-)

sorry, I guess I was a bit terse.

>   The relevant parts of the cited sections of RFC 2616 read [...]
>   Which appears to support your claim. Unfortunately, the HTML 4.01
>   Recommendation, Section 5.2.2, reads: [...]

>   Which puts us in a right pretty pickle.

a Standards Track RFC can't be overridden by a Recommendation from

>   This new behaviour goes some way towards addressing your concern,
>   but you will still find your documents labelled Invalid unless you
>   specify a character encoding.

my document is valid, so this is incorrect behaviour.

>   I would strongly encourage you to explicitly specify the character
>   encoding.  In particular, I direct your attention to the part of
>   RFC2616 3.4.1 which reads: «Senders wishing to defeat this
>   behavior MAY include a charset parameter even when the charset is
>   ISO-8859-1 ***and SHOULD do so when it is known that it will not
>   confuse the recipient.***» [emphasis added].
>   In this particular case, not only is it known that specifying the
>   encoding will not confuse the recipient; explicitly specifying it
>   is the only way to _avoid_ confusing «the recipient» (IOW, the
>   «SHOULD» certainly kicks in).

I don't subscribe to cargo cult coding, and I don't care about
catering to broken software.  also note that this paragraph wasn't in
the original HTTP/1.1 RFC, and the text in 5.2.2 has not changed since
HTML 4.0 of December 1997.

furthermore, configuring Apache to set include charset=iso-8859-1 for
all files of type text/html will make it impossible for a document to
use a different charset since it overrides META HTTP-EQUIV.  (another
poor choice in the HTML Recommendation, IMHO).

>   [3] - <http://validator.w3.org:8001/>. Feedback encouraged!

well, it didn't process http://www.usenet.no.  in fact it assumed
UTF-8, which there is no basis for doing at all.  IMO, that's a
further regression.

Kjetil T.
