- From: Jukka K. Korpela <jkorpela@cs.tut.fi>
- Date: Thu, 24 Apr 2008 23:04:45 +0300
- To: "W3C Validator Community" <www-validator@w3.org>
Henri Sivonen wrote: > Considering the real Web content, it is better to pick Windows-1252 > than a hypothetical generic encoding. No, it's not because _in validation_ you don't need to make any guess on the meanings of octets > 127 decimal. You're not supposed to render them (apart from echoing them along with error messages, but they're not markup-significant) or to process them in any way but treating them as data characters. If you assume windows-1252, then many possible octets will be unassigned and you may well have the problem of having guessed something and then detected the guess must be wrong. The document could be in some other 8-bit encoding, or in UTF-8, or something else, and if you hadn't bet on windows-1252, you would have analyzed the markup properly. Jukka K. Korpela ("Yucca") http://www.cs.tut.fi/~jkorpela/
Received on Thursday, 24 April 2008 20:05:23 UTC