Re: Fallbeck to UTF-8 from Frank Ellermann on 2007-11-29 (www-validator@w3.org from November 2007)

From: Frank Ellermann <nobody@xyzzy.claranet.de>
Date: Thu, 29 Nov 2007 17:05:30 +0100
To: www-validator@w3.org
Message-ID: <fimns9$uf4$1@ger.gmane.org>

Andreas Prilop wrote:

> (b) Take ISO-8859-1 as fallback encoding (the default of RFC 2616).
>     This will "work" if no bytes from 0x80 to 0x9F are present -
>     hence with many of the traditional 8-bit character sets.
>     Otherwise (if some bytes from 0x80 to 0x9F are found),
>     give the usual errors about "non SGML character number ..."

That's a variation of the current UTF-8 default, it could result
in a flood of errors for say windows-1252 pages with lots of Euros.

I'd prefer a completely unlikely "SBCS" with proper subset ASCII
permitting all octets from 0x80 up to 0xFF.  And at the end, after
all other errors based on this assumption are reported, one final
"you lose - unknown charset" (optional as gimmick:  "whatever it
is, it's certainly not UTF-8", if that is known in your scenario).

 Frank

Received on Thursday, 29 November 2007 16:04:09 UTC