W3C home > Mailing lists > Public > www-validator@w3.org > April 2008

Re: Fallback to UTF-8

From: David Dorward <david@dorward.me.uk>
Date: Mon, 28 Apr 2008 15:28:05 +0100
Message-Id: <D963FD08-2057-4C49-9D45-3B4FFC342EE6@dorward.me.uk>
To: www-validator@w3.org


On 28 Apr 2008, at 15:08, Andreas Prilop wrote:
> (1) The validator establishes that Humpty Dumpty
>    is *not* the President of the United States.
>
> (2) The validator assumes:
>    President of the United States = Humpty Dumpty.
>
> (1) and (2) happen together.  Both.  At the same time.   
> Simultaneously.
>

That isn't what happens. This is:

(1) The validator finds no encoding information in the HTTP header or  
document.

(2) The validator tries to carry on anyway as if the encoding was  
specified as UTF-8

(3) The validator fails to parse as UTF-8, so gives up.

In that order. One after the other. Not simultaneously.

The behaviour when it is overridden and told to use ISO-8859-1 is  
slightly different:

(1) The validator is told to use ISO-8859-1

(2) The validator parses as ISO-8859-1 but finds errors

(3) The validator throws error messages for characters which don't fit

I don't know why it behaves differently under those conditions, but I  
think having consistency would be desirable.


-- 
David Dorward
http://dorward.me.uk/
http://blog.dorward.me.uk/
Received on Monday, 28 April 2008 14:29:02 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Wednesday, 25 April 2012 12:14:29 GMT