- From: Frank Ellermann <nobody@xyzzy.claranet.de>
- Date: Thu, 24 Apr 2008 22:11:13 +0200
- To: www-validator@w3.org
Henri Sivonen wrote: > Considering the real Web content, it is better to pick Windows-1252 > than a hypothetical generic encoding. A good strategy for browsers, not necessarily for validators IFF it could accept wild mixtures of Latin-1 and UTF-8 as "valid" windows-1252. Andreas' example shows that assuming UTF-8 does not work as it should for validator.w3, it ends up in a fatal error instead of reporting non-UTF-8 octets. His Latin-1 example was better, it reported 0x80 as non-Latin-1. Your proposal "just assume windows-1252" is an idea for the validation step, but it could have rather odd effects for the UTF-8 output of other errors, when the input contains any octet in the range 0x80..0x9F, or worse, if the input in fact was UTF-8, not windows-1252. Jukka's proposal avoids most surprises - all octets 0x80..0xFF are accepted as "unknown garbage". He didn't say how that can be displayed in the error output, question marks ? u+FFFD ? Frank
Received on Thursday, 24 April 2008 20:10:24 UTC