W3C home > Mailing lists > Public > www-validator@w3.org > May 2008

Re: Fallback to UTF-8

From: olivier Thereaux <ot@w3.org>
Date: Mon, 5 May 2008 10:21:12 +0900
Cc: www-validator@w3.org
Message-Id: <0B923BB8-F23E-4BE6-AC6B-7EAA9100D7D5@w3.org>
To: Andreas Prilop <prilop2008@trashmail.net>


On 2-May-08, at 11:09 PM, Andreas Prilop wrote:
> With UTF-8 or Windows-1252 assumed, the W3C validator simply gives up
> and does nothing
>
>   "Sorry! This document can not be checked."
>
> when it finds some byte (or byte sequence) that it cannot
> interpret as Windows-1252 or UTF-8.

Which is why the validator was patched to try latin-1, after utf-8 and  
win-1252.  Can you give it a look?

http://qa-dev.w3.org/wmvs/HEAD/

> The W3C validator just reports "non SGML character number ...",
> which is still better than to sit there and to do nothing.

Arguably. For experts in SGML and markup languages, yes, "non SGML  
character" is an obvious sign of an encoding issue. For most people,  
however, "non SGML character number" is gibberish, whereas "sorry,  
there is a problem because I could not determine the encoding of your  
document" is somewhat understandable.
Received on Monday, 5 May 2008 01:21:45 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Wednesday, 25 April 2012 12:14:29 GMT