Re: Fallback to UTF-8 from David Dorward on 2008-04-24 (www-validator@w3.org from April 2008)

From: David Dorward <david@dorward.me.uk>
Date: Thu, 24 Apr 2008 17:47:24 +0100
To: www-validator@w3.org
Message-Id: <1BA9F37D-56D1-4A62-BF79-35509A576218@dorward.me.uk>

On 24 Apr 2008, at 17:11, Andreas Prilop wrote:
> On Thu, 24 Apr 2008, David Dorward wrote:
>
>> If it assumes ISO-8859-1 and the document is UTF-8,
>> how is that any improvement?

> - First it assumes "charset=utf-8".
> - Then it immediately states that this is impossible.
>
> Where is the logic in this behaviour?

I agree that it is not ideal, but assuming ISO-8859-1 for UTF-8  
documents is no better than assuming UTF-8 for ISO-8859-1 documents.  
(Or either for Shift_JIS documents etc etc).

Your argument appears to be "I use ISO-8859-1 therefore the validator  
should default to ISO-8859-1", which isn't, IMO, a very convincing  
one. Am I interpreting you incorrectly?

Looking at the HTML spec, it says 'user agents must not assume any  
default value for the "charset" parameter' (http://www.w3.org/TR/html4/charset.html 
). So, following that guidance, the validator shouldn't guess at all  
and should just state that no encoding was found and that it can't  
continue until one is specified.

My preference would be to try to validate the document by assuming a  
number of different encodings in turn until one was successfully  
parsed, but this would be significantly more work when just changing  
the default. In that event, I might also be tempted to recommend  
making the warning about guessing even more prominent then it is at  
present (a fat red border perhaps?).

-- 
David Dorward
http://dorward.me.uk/
http://blog.dorward.me.uk/

Received on Thursday, 24 April 2008 16:48:10 UTC