W3C home > Mailing lists > Public > www-validator@w3.org > April 2008

Re: Fallback to UTF-8

From: David Dorward <david@dorward.me.uk>
Date: Mon, 28 Apr 2008 15:28:05 +0100
Message-Id: <D963FD08-2057-4C49-9D45-3B4FFC342EE6@dorward.me.uk>
To: www-validator@w3.org

On 28 Apr 2008, at 15:08, Andreas Prilop wrote:
> (1) The validator establishes that Humpty Dumpty
>    is *not* the President of the United States.
> (2) The validator assumes:
>    President of the United States = Humpty Dumpty.
> (1) and (2) happen together.  Both.  At the same time.   
> Simultaneously.

That isn't what happens. This is:

(1) The validator finds no encoding information in the HTTP header or  

(2) The validator tries to carry on anyway as if the encoding was  
specified as UTF-8

(3) The validator fails to parse as UTF-8, so gives up.

In that order. One after the other. Not simultaneously.

The behaviour when it is overridden and told to use ISO-8859-1 is  
slightly different:

(1) The validator is told to use ISO-8859-1

(2) The validator parses as ISO-8859-1 but finds errors

(3) The validator throws error messages for characters which don't fit

I don't know why it behaves differently under those conditions, but I  
think having consistency would be desirable.

David Dorward
Received on Monday, 28 April 2008 14:29:02 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 22:59:07 UTC