- From: Jukka K. Korpela <jkorpela@cs.tut.fi>
- Date: Fri, 30 Nov 2007 00:21:34 +0200
- To: <www-validator@w3.org>
Scripsit Frank Ellermann: > I'd prefer a completely unlikely "SBCS" with proper subset ASCII > permitting all octets from 0x80 up to 0xFF. That sounds like the best option, and it's a simple one, except for the explanations. It's not any particular encoding but rather an open class of encodings. But it will do fine, since it's a correct guess in a vast majority of cases (including all the pages that are really windows-1252 or just Ascii plus pages in different national encodings), and it's really irrelevant what the octets 0x80 to 0xFF mean in such encodings. Some of them might be undefined, for a particular encoding, but we don't know the real encoding. > And at the end, after > all other errors based on this assumption are reported, one final > "you lose - unknown charset" (optional as gimmick: "whatever it > is, it's certainly not UTF-8", if that is known in your scenario). Well, hopefully nothing like that. I think the report should be _preceded_ by a clear note, and might end with a note too (since people may miss the initial note). It could, directly or indirectly (via a link) say something like the following: The document cannot be validated, since the character encoding has not been specified. However, tentative validation was carried out based on the assumption that the encoding is some 8-bit encoding where the first 128 code positions are as in US-ASCII. You should specify the encoding as described in HTML specifications and resubmit the document for validation. Jukka K. Korpela ("Yucca") http://www.cs.tut.fi/~jkorpela/
Received on Thursday, 29 November 2007 22:22:31 UTC