On Apr 24, 2008, at 23:04 , Jukka K. Korpela wrote: > Henri Sivonen wrote: > >> Considering the real Web content, it is better to pick Windows-1252 >> than a hypothetical generic encoding. > > No, it's not because _in validation_ you don't need to make any > guess on > the meanings of octets > 127 decimal. Validator.nu, for example, checks for bad byte sequences in the encoding (subject to decoder bugs), looks for the last two non- character code points on each plane and looks for PUA characters. > You're not supposed to render them > (apart from echoing them along with error messages, but they're not > markup-significant) or to process them in any way but treating them as > data characters. Rendering source extracts is a significant part of validator UI. > If you assume windows-1252, then many possible octets will be > unassigned > and you may well have the problem of having guessed something and then > detected the guess must be wrong. The document could be in some other > 8-bit encoding, or in UTF-8, or something else, and if you hadn't > bet on > windows-1252, you would have analyzed the markup properly. Right, but if non-declared non-ASCII is an error, the pass/fail outcome will be right even if for the wrong reason. -- Henri Sivonen hsivonen@iki.fi http://hsivonen.iki.fi/Received on Friday, 25 April 2008 07:17:11 UTC
This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 22:59:07 UTC