Fallbeck to UTF-8

I still believe that the following behaviour is illogical and
not really helpful. (It has been discussed before.)


Given a webpage that does not specify any encoding (charset).
Unfortunately, this still happens and such pages are mostly
Windows-1251 or Windows-1252 encoded.

Then validator.w3.org reports:

(1) No Character Encoding Found! Falling back to UTF-8.

(2) Sorry, I am unable to validate this document because on line ...
    it contained one or more bytes that I cannot interpret as utf-8
    (in other words, the bytes found are not valid values in
    the specified Character Encoding).


This makes no sense; and it doesn't help the user.
The logical procedure would be:

(1) On line ... the document contained one or more bytes
    that I cannot interpret as UTF-8 (in other words, the bytes
    found are not valid values in UTF-8).

(2) Therefore I don't fall back to UTF-8.


N.B.
I do not suggest a specific other fallback encoding or fallback
behaviour. I just say that it is illogical to assume first UTF-8
and then immediately claim that UTF-8 is impossible.

Received on Wednesday, 28 November 2007 17:15:22 UTC