W3C home > Mailing lists > Public > www-validator@w3.org > May 2008

Re: Fallback to UTF-8

From: Andreas Prilop <prilop2008@trashmail.net>
Date: Fri, 2 May 2008 16:09:31 +0200 (MEST)
To: www-validator@w3.org
Message-ID: <Pine.GSO.4.63.0805021557140.23800@s5b004.rrzn.uni-hannover.de>

On Thu, 1 May 2008, Jukka K. Korpela wrote:

> This is a good reason not to assume ISO-8859-1 in a validator,
> because it leads to pointless error messages about data characters.

In theory - yes.

But not in practice for the W3C validator!
That's the reason I have started this thread.
Is this still unclear?

With UTF-8 or Windows-1252 assumed, the W3C validator simply gives up
and does nothing

   "Sorry! This document can not be checked."

when it finds some byte (or byte sequence) that it cannot
interpret as Windows-1252 or UTF-8.

With ISO-8859-1 assumed, it does check and it does give
a helpful error report.

   "This page is not Valid HTML 4.01 Strict!"
   "Result:  Failed validation, 2 Errors"

The W3C validator just reports "non SGML character number ...",
which is still better than to sit there and to do nothing.

Received on Friday, 2 May 2008 14:10:18 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 22:59:08 UTC