- From: Andreas Prilop <aprilop2007@trashmail.net>
- Date: Thu, 29 Nov 2007 16:25:32 +0100 (MET)
- To: www-validator@w3.org
On Thu, 29 Nov 2007, olivier Thereaux wrote: >> Given a webpage that does not specify any encoding (charset). >> Then validator.w3.org reports: >> >> (1) No Character Encoding Found! Falling back to UTF-8. >> >> (2) Sorry, I am unable to validate this document because on line ... >> it contained one or more bytes that I cannot interpret as utf-8 >> >> This makes no sense; and it doesn't help the user. > > You're not suggesting a better procedure, either. OK, here are my suggestions: (a) Immediately tell "This document cannot be checked" without any reference to UTF-8. Since the document cannot be taken as UTF-8- encoded, "charset=utf-8" was most probably not the author's intention. OR (b) Take ISO-8859-1 as fallback encoding (the default of RFC 2616). This will "work" if no bytes from 0x80 to 0x9F are present - hence with many of the traditional 8-bit character sets. Otherwise (if some bytes from 0x80 to 0x9F are found), give the usual errors about "non SGML character number ..."
Received on Thursday, 29 November 2007 15:33:01 UTC