On Thu, 29 Nov 2007, olivier Thereaux wrote: >> Given a webpage that does not specify any encoding (charset). >> Then validator.w3.org reports: >> >> (1) No Character Encoding Found! Falling back to UTF-8. >> >> (2) Sorry, I am unable to validate this document because on line ... >> it contained one or more bytes that I cannot interpret as utf-8 >> >> This makes no sense; and it doesn't help the user. > > You're not suggesting a better procedure, either. OK, here are my suggestions: (a) Immediately tell "This document cannot be checked" without any reference to UTF-8. Since the document cannot be taken as UTF-8- encoded, "charset=utf-8" was most probably not the author's intention. OR (b) Take ISO-8859-1 as fallback encoding (the default of RFC 2616). This will "work" if no bytes from 0x80 to 0x9F are present - hence with many of the traditional 8-bit character sets. Otherwise (if some bytes from 0x80 to 0x9F are found), give the usual errors about "non SGML character number ..."Received on Thursday, 29 November 2007 15:33:01 GMT
This archive was generated by hypermail 2.2.0+W3C-0.50 : Wednesday, 25 April 2012 12:14:27 GMT