Re: Fallbeck to UTF-8 from olivier Thereaux on 2007-11-29 (www-validator@w3.org from November 2007)

From: olivier Thereaux <ot@w3.org>
Date: Thu, 29 Nov 2007 10:52:47 +0900
To: Andreas Prilop <aprilop2007@trashmail.net>
Cc: www-validator@w3.org
Message-Id: <734E38AE-E66E-4B26-A583-2F81C333C3DC@w3.org>

On 29 nov. 07, at 02:15, Andreas Prilop wrote:
> I still believe that the following behaviour is illogical and
> not really helpful. (It has been discussed before.)
>
>
> Given a webpage that does not specify any encoding (charset).
>
> Then validator.w3.org reports:
>
> (1) No Character Encoding Found! Falling back to UTF-8.
>
> (2) Sorry, I am unable to validate this document because on line ...
>    it contained one or more bytes that I cannot interpret as utf-8
>
> This makes no sense; and it doesn't help the user.

You're not suggesting a better procedure, either. As far as I can  
tell, the alternative (as done by other tools) is to simply throw a  
fatal error whenever no charset is given. Trying to fall back to utf-8  
at least helps in some cases. Better than nothing IMHO.

Maybe what you would like is a different error message? Instead of  
"sorry I am not able to validate because it is not utf-8", in the case  
of a charset fallback, say something like "sorry, I am not able to  
read this document because it does not declare any encoding and an  
attempt to fall back failed. Please do this and that..."

-- 
olivier

Received on Thursday, 29 November 2007 01:52:55 UTC