W3C home > Mailing lists > Public > www-validator@w3.org > May 2008

Re: Fallback to UTF-8

From: Andreas Prilop <prilop2008@trashmail.net>
Date: Fri, 2 May 2008 16:09:31 +0200 (MEST)
To: www-validator@w3.org
Message-ID: <Pine.GSO.4.63.0805021557140.23800@s5b004.rrzn.uni-hannover.de>

On Thu, 1 May 2008, Jukka K. Korpela wrote:

> This is a good reason not to assume ISO-8859-1 in a validator,
> because it leads to pointless error messages about data characters.

In theory - yes.

But not in practice for the W3C validator!
That's the reason I have started this thread.
Is this still unclear?

With UTF-8 or Windows-1252 assumed, the W3C validator simply gives up
and does nothing

   "Sorry! This document can not be checked."

when it finds some byte (or byte sequence) that it cannot
interpret as Windows-1252 or UTF-8.
http://validator.w3.org/check?uri=www.unics.uni-hannover.de/nhtcapri/test.htm
http://validator.w3.org/check?uri=www.unics.uni-hannover.de/nhtcapri/test.htm;charset=windows-1252

With ISO-8859-1 assumed, it does check and it does give
a helpful error report.
http://validator.w3.org/check?uri=www.unics.uni-hannover.de/nhtcapri/test.htm;charset=iso-8859-1

   "This page is not Valid HTML 4.01 Strict!"
   "Result:  Failed validation, 2 Errors"

The W3C validator just reports "non SGML character number ...",
which is still better than to sit there and to do nothing.

http://www.unics.uni-hannover.de/nhtcapri/test.htm
Received on Friday, 2 May 2008 14:10:18 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Wednesday, 25 April 2012 12:14:29 GMT