W3C home > Mailing lists > Public > www-validator@w3.org > November 2007

Re: Fallbeck to UTF-8

From: Andreas Prilop <aprilop2007@trashmail.net>
Date: Thu, 29 Nov 2007 16:25:32 +0100 (MET)
To: www-validator@w3.org
Message-ID: <Pine.GSO.4.63.0711291614110.3541@s5b004.rrzn.uni-hannover.de>

On Thu, 29 Nov 2007, olivier Thereaux wrote:

>> Given a webpage that does not specify any encoding (charset).
>> Then validator.w3.org reports:
>> (1) No Character Encoding Found! Falling back to UTF-8.
>> (2) Sorry, I am unable to validate this document because on line ...
>>     it contained one or more bytes that I cannot interpret as utf-8
>> This makes no sense; and it doesn't help the user.
> You're not suggesting a better procedure, either.

OK, here are my suggestions:

(a) Immediately tell "This document cannot be checked" without any
    reference to UTF-8. Since the document cannot be taken as UTF-8-
    encoded, "charset=utf-8" was most probably not the author's


(b) Take ISO-8859-1 as fallback encoding (the default of RFC 2616).
    This will "work" if no bytes from 0x80 to 0x9F are present -
    hence with many of the traditional 8-bit character sets.
    Otherwise (if some bytes from 0x80 to 0x9F are found),
    give the usual errors about "non SGML character number ..."
Received on Thursday, 29 November 2007 15:33:01 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 1 March 2016 14:17:54 UTC