Re: Fallback to UTF-8

On Thu, 1 May 2008, Jukka K. Korpela wrote:

> This is a good reason not to assume ISO-8859-1 in a validator,
> because it leads to pointless error messages about data characters.

In theory - yes.

But not in practice for the W3C validator!
That's the reason I have started this thread.
Is this still unclear?

With UTF-8 or Windows-1252 assumed, the W3C validator simply gives up
and does nothing

   "Sorry! This document can not be checked."

when it finds some byte (or byte sequence) that it cannot
interpret as Windows-1252 or UTF-8.
http://validator.w3.org/check?uri=www.unics.uni-hannover.de/nhtcapri/test.htm
http://validator.w3.org/check?uri=www.unics.uni-hannover.de/nhtcapri/test.htm;charset=windows-1252

With ISO-8859-1 assumed, it does check and it does give
a helpful error report.
http://validator.w3.org/check?uri=www.unics.uni-hannover.de/nhtcapri/test.htm;charset=iso-8859-1

   "This page is not Valid HTML 4.01 Strict!"
   "Result:  Failed validation, 2 Errors"

The W3C validator just reports "non SGML character number ...",
which is still better than to sit there and to do nothing.

http://www.unics.uni-hannover.de/nhtcapri/test.htm

Received on Friday, 2 May 2008 14:10:18 UTC