- From: Henri Sivonen <hsivonen@iki.fi>
- Date: Fri, 25 Apr 2008 12:25:21 +0300
- To: W3C Validator Community <www-validator@w3.org>
On Apr 25, 2008, at 11:00 , Jukka K. Korpela wrote: > Henri Sivonen wrote: > >> Validator.nu, for example, checks for bad byte sequences in the >> encoding (subject to decoder bugs), looks for the last two non- >> character code points on each plane and looks for PUA characters. > > That's a different issue. The question was about handling data for > which > no encoding has been specified. Hence there is formally no criterion > for > "bad byte sequences", still less for anything related to code points. Depends on which spec you read. My point is that while HTML 4.01 doesn't specify this properly, this is a solved problem (by HTML 5) in text/html, so it isn't particularly productive to extrapolate from legacy specs in other ways. >> - - if non-declared non-ASCII is an error, the pass/fail >> outcome will be right even if for the wrong reason. > > Anything non-declared (even if it consists just of octets in the ASCII > range) is an error, but at a category level other than validation > errors. Formally, there is no document to be validated, just some lump > of octets. Hence, the correct response says this and _could_ refuse to > do anything else. Even "This document can not be checked" is a bit > questionable. Which _document_? Better: The submitted data cannot be > interpreted as a marked-up document. How about "The character encoding of the document was not explicit (assumed windows-1252) but the document contains non-ASCII." http://html5.validator.nu/?doc=http%3A%2F%2Fwww.unics.uni-hannover.de%2Fnhtcapri%2Ftest.htm -- Henri Sivonen hsivonen@iki.fi http://hsivonen.iki.fi/
Received on Friday, 25 April 2008 09:26:06 UTC