- From: Henri Sivonen <hsivonen@iki.fi>
- Date: Fri, 25 Apr 2008 10:20:40 +0300
- To: W3C Validator Community <www-validator@w3.org>
On Apr 24, 2008, at 23:11 , Frank Ellermann wrote: > Henri Sivonen wrote: > >> Considering the real Web content, it is better to pick Windows-1252 >> than a hypothetical generic encoding. > > A good strategy for browsers, not necessarily for validators > IFF it could accept wild mixtures of Latin-1 and UTF-8 as > "valid" windows-1252. [...] > Your proposal "just assume windows-1252" is an idea for the > validation step, That wasn't the proposal. The proposal was: Assume Windows-1252 but treat the upper half as errors. > but it could have rather odd effects for the > UTF-8 output of other errors, when the input contains any octet > in the range 0x80..0x9F, or worse, if the input in fact was > UTF-8, not windows-1252. Would mere U+FFFD be better? > Jukka's proposal avoids most surprises - all octets 0x80..0xFF > are accepted as "unknown garbage". I think a quality assurance tool should not *accept* unknown garbage but emit an error on non-declared non-ASCII. -- Henri Sivonen hsivonen@iki.fi http://hsivonen.iki.fi/
Received on Friday, 25 April 2008 07:21:20 UTC