- From: Jukka K. Korpela <jkorpela@cs.tut.fi>
- Date: Fri, 25 Apr 2008 11:00:21 +0300
- To: "W3C Validator Community" <www-validator@w3.org>
Henri Sivonen wrote: > Validator.nu, for example, checks for bad byte sequences in the > encoding (subject to decoder bugs), looks for the last two non- > character code points on each plane and looks for PUA characters. That's a different issue. The question was about handling data for which no encoding has been specified. Hence there is formally no criterion for "bad byte sequences", still less for anything related to code points. > - - if non-declared non-ASCII is an error, the pass/fail > outcome will be right even if for the wrong reason. Anything non-declared (even if it consists just of octets in the ASCII range) is an error, but at a category level other than validation errors. Formally, there is no document to be validated, just some lump of octets. Hence, the correct response says this and _could_ refuse to do anything else. Even "This document can not be checked" is a bit questionable. Which _document_? Better: The submitted data cannot be interpreted as a marked-up document. If you wish to do something additional to help the user - and this is probably a good idea if implemented properly - then the report should clearly say what has been done ("Falling back" sounds like an odd expression) and it should use a guess that is the least likely to spawn wrong or misleading error messages. If the additional thing tends to confuse users rather than help them, then, well, maybe the validator should just say "I can't process your data" in some polite and informative terms. Jukka K. Korpela ("Yucca") http://www.cs.tut.fi/~jkorpela/
Received on Friday, 25 April 2008 08:00:51 UTC