Re: Fallback to UTF-8 from David Dorward on 2008-04-28 (www-validator@w3.org from April 2008)

From: David Dorward <david@dorward.me.uk>
Date: Mon, 28 Apr 2008 15:28:05 +0100
To: www-validator@w3.org
Message-Id: <D963FD08-2057-4C49-9D45-3B4FFC342EE6@dorward.me.uk>

On 28 Apr 2008, at 15:08, Andreas Prilop wrote:
> (1) The validator establishes that Humpty Dumpty
>    is *not* the President of the United States.
>
> (2) The validator assumes:
>    President of the United States = Humpty Dumpty.
>
> (1) and (2) happen together.  Both.  At the same time.   
> Simultaneously.
>

That isn't what happens. This is:

(1) The validator finds no encoding information in the HTTP header or  
document.

(2) The validator tries to carry on anyway as if the encoding was  
specified as UTF-8

(3) The validator fails to parse as UTF-8, so gives up.

In that order. One after the other. Not simultaneously.

The behaviour when it is overridden and told to use ISO-8859-1 is  
slightly different:

(1) The validator is told to use ISO-8859-1

(2) The validator parses as ISO-8859-1 but finds errors

(3) The validator throws error messages for characters which don't fit

I don't know why it behaves differently under those conditions, but I  
think having consistency would be desirable.

-- 
David Dorward
http://dorward.me.uk/
http://blog.dorward.me.uk/

Received on Monday, 28 April 2008 14:29:02 UTC