- From: Henri Sivonen <hsivonen@iki.fi>
- Date: Fri, 25 Apr 2008 10:16:20 +0300
- To: W3C Validator Community <www-validator@w3.org>
On Apr 24, 2008, at 23:04 , Jukka K. Korpela wrote: > Henri Sivonen wrote: > >> Considering the real Web content, it is better to pick Windows-1252 >> than a hypothetical generic encoding. > > No, it's not because _in validation_ you don't need to make any > guess on > the meanings of octets > 127 decimal. Validator.nu, for example, checks for bad byte sequences in the encoding (subject to decoder bugs), looks for the last two non- character code points on each plane and looks for PUA characters. > You're not supposed to render them > (apart from echoing them along with error messages, but they're not > markup-significant) or to process them in any way but treating them as > data characters. Rendering source extracts is a significant part of validator UI. > If you assume windows-1252, then many possible octets will be > unassigned > and you may well have the problem of having guessed something and then > detected the guess must be wrong. The document could be in some other > 8-bit encoding, or in UTF-8, or something else, and if you hadn't > bet on > windows-1252, you would have analyzed the markup properly. Right, but if non-declared non-ASCII is an error, the pass/fail outcome will be right even if for the wrong reason. -- Henri Sivonen hsivonen@iki.fi http://hsivonen.iki.fi/
Received on Friday, 25 April 2008 07:17:11 UTC