- From: David Woolley <david@djwhome.demon.co.uk>
- Date: Sun, 22 Feb 2004 09:09:43 +0000 (GMT)
- To: www-style@w3.org
> dependent, but I wouldn't mind it trying. Trying to auto detect when > the result is valid ISO-8859-1 (or whatever the default document > character encoding is for that type of document strikes me as > arrogant, especially since I can't imagine why anyone would want It is necessary if autodetect is to be of use in the real world, although, as noted before, the lack of maintenance of this feature, in IE, suggests it is not widely used. In any case, general autodetect is normally an option that the user has to enable. Both big5 and gb2312 use only the same byte values as iso-8859-1. Whilst you could reject Windows 1252 if it contains bytes which are not in the iso-8859-1 subset (and the distinction is irrelevant otherwise), you can't tell between gb2312 and iso-8859-1 without looking at the statistics of the data (probably simple frequencies and digram frequencies[1]). As far as I know all auto-detect features use such statistics and they will get things wrong for pathological cases. (Some do require extra hints.) Again, as noted before, most users are only interested in one one character set, plus possibly ISO 646/INV (i.e. ASCII), which is a subset or unshifted variant of most others used in practice on the web, so the simpler algorithm of using a fixed, but selectable, character set, when none is specified, normally works for them. [1] used of the bytes, rather than of the characters.
Received on Sunday, 22 February 2004 04:09:47 UTC