Re: Document without charset from olivier Thereaux on 2005-12-08 (www-validator@w3.org from December 2005)

From: olivier Thereaux <ot@w3.org>
Date: Fri, 9 Dec 2005 07:02:08 +0900
To: Andreas Prilop <nhtcapri@rrzn-user.uni-hannover.de>
Cc: www-validator@w3.org
Message-Id: <52A5643B-E4CA-40F9-BFD6-605FF65FEA9A@w3.org>

On 9 Dec 2005, at 00:53, Andreas Prilop wrote:

>
> Reference:
> http://validator.w3.org/check?uri=www.unics.uni-hannover.de/ 
> nhtcapri/test.htm
>
> The validator says:
>
> | Encoding:   utf-8
> | Sorry, I am unable to validate this document because [...]
> | it contained one or more bytes that I cannot interpret as utf-8
>
> This is not helpful!
> Why does the validator assume UTF-8 in the first place?

This is a bug.

There is a routine for the validator to try and detect the character  
encoding by all ways described by the specs. What it is supposed to  
do if it does not find anything is to output a warning about no  
character encoding found, and point to this documentation:
http://validator.w3.org/docs/help.html#faq-charset

The fact that instead, it defaults to utf-8 and does not output the  
warning is indeed a bug, introduced a few months ago it seems. We'll  
look into fixing it as soon as possible.

Thanks,
olivier
-- 
olivier Thereaux - W3C - http://www.w3.org/People/olivier/
W3C Open Source Software: http://www.w3.org/Status

Received on Thursday, 8 December 2005 22:02:18 UTC