- From: Frank Ellermann <nobody@xyzzy.claranet.de>
- Date: Fri, 25 Apr 2008 18:17:43 +0200
- To: www-validator@w3.org
Henri Sivonen wrote: > The proposal was: Assume Windows-1252 but treat the upper > half as errors. Then you could end up with reporting tons of errors, instead of only one error, as in Jukka's proposal. validator.w3.org is near to the ideal "one error" with its UTF-8 approach, but unfortunately it is a fatal error, suppressing anything else it would find with the other proposals (Andreas, Jukka, you). [output] > Would mere U+FFFD be better? For Unicode output IMO good enough: There was at least one error, the missing charset, the user has to come back anyway. Of course these strategies fail miserably when the markup is non-ASCII or worse (UTF-1, UTF-7, UTF-16, BOCU-1, SCSU, etc.), but to cover such oddities we could declare victory with the UTF-8 fallback as is - obviously not what we want (for HTML). >> Jukka's proposal avoids most surprises - all octets >> 0x80..0xFF are accepted as "unknown garbage". > I think a quality assurance tool should not *accept* unknown > garbage but emit an error on non-declared non-ASCII. I meant "accept" limited to parsing the input, in the sense of "not giving up with a fatal error", as validator.w3.org does it when its UTF-8 fallback turns out to be wrong. Of course any "unknown garbage" is an error. But with Jukka's proposal this is *one* error, neither fatal, nor "thousands of errors". Frank
Received on Friday, 25 April 2008 16:15:47 UTC