W3C home > Mailing lists > Public > www-validator@w3.org > November 2007

Re: Fallbeck to UTF-8

From: Frank Ellermann <nobody@xyzzy.claranet.de>
Date: Thu, 29 Nov 2007 17:05:30 +0100
To: www-validator@w3.org
Message-ID: <fimns9$uf4$1@ger.gmane.org>

Andreas Prilop wrote:

> (b) Take ISO-8859-1 as fallback encoding (the default of RFC 2616).
>     This will "work" if no bytes from 0x80 to 0x9F are present -
>     hence with many of the traditional 8-bit character sets.
>     Otherwise (if some bytes from 0x80 to 0x9F are found),
>     give the usual errors about "non SGML character number ..."

That's a variation of the current UTF-8 default, it could result
in a flood of errors for say windows-1252 pages with lots of Euros.

I'd prefer a completely unlikely "SBCS" with proper subset ASCII
permitting all octets from 0x80 up to 0xFF.  And at the end, after
all other errors based on this assumption are reported, one final
"you lose - unknown charset" (optional as gimmick:  "whatever it
is, it's certainly not UTF-8", if that is known in your scenario).

 Frank
Received on Thursday, 29 November 2007 16:04:09 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Wednesday, 25 April 2012 12:14:27 GMT