Fallbeck to UTF-8 from Andreas Prilop on 2007-11-28 (www-validator@w3.org from November 2007)

From: Andreas Prilop <aprilop2007@trashmail.net>
Date: Wed, 28 Nov 2007 18:15:08 +0100 (MET)
To: www-validator@w3.org
Message-ID: <Pine.GSO.4.63.0711281803240.1757@s5b004.rrzn.uni-hannover.de>

I still believe that the following behaviour is illogical and
not really helpful. (It has been discussed before.)


Given a webpage that does not specify any encoding (charset).
Unfortunately, this still happens and such pages are mostly
Windows-1251 or Windows-1252 encoded.

Then validator.w3.org reports:

(1) No Character Encoding Found! Falling back to UTF-8.

(2) Sorry, I am unable to validate this document because on line ...
    it contained one or more bytes that I cannot interpret as utf-8
    (in other words, the bytes found are not valid values in
    the specified Character Encoding).


This makes no sense; and it doesn't help the user.
The logical procedure would be:

(1) On line ... the document contained one or more bytes
    that I cannot interpret as UTF-8 (in other words, the bytes
    found are not valid values in UTF-8).

(2) Therefore I don't fall back to UTF-8.


N.B.
I do not suggest a specific other fallback encoding or fallback
behaviour. I just say that it is illogical to assume first UTF-8
and then immediately claim that UTF-8 is impossible.

Received on Wednesday, 28 November 2007 17:15:22 UTC