- From: Řistein E. Andersen <html5@xn--istein-9xa.com>
- Date: Wed, 30 Jul 2008 00:55:25 +0200
On 22 May 2008, at 12:40, Ian Hickson wrote: > would you say that what the spec says now is what browsers > implement? What should we change? The current table seems to cover the mappings between different common compatible 8-bit encodings as implemented in IE7, yes. The table at <http://coq.no/character-tables/mime/en> gives a bit more detail, most of which is better kept outside HTML5 itself. However, the following observations can be made: 1. Opera, Firefox and Safari all handle US-ASCII as Windows-1252. IE7, on the other hand, simply ignores the high bit (as it does for a few other 7-bit encodings, by the way). Perhaps this alias could be dropped from the other browsers. 2. Firefox and Opera seem to sniff for text/plain; charset=ISO-8859-1 (as per HTML5), whereas Safari seems to do the same for text/plain; charset=ISO-8859-11 instead [Version 3.1.2 (5525.20.1)]. Bug? 3. For certain character sets, different browsers map to different, but visually similar Unicode characters. Sometimes, one mapping is old/outdated, but this is not always the case. 4. Delete (0x7F) and the C1 range (0x80--0x9F) are handled quite inconsistently; different browsers do different things for the same encoding, and the same browser gives analogous encodings different treatment. (For the early ISO-8859-* encodings, the IANA registry points to RFC 1345, which effectively maps 0x7F--0x9F to U+7F--U+9F, but does not really seem to regard this feature as an essential part of the character set: the charset is often coded with both graphical and control character sets. If the coded character set is a 96-character set, it is tabled with the relevant GL set (normally ISO-IR-6) and with ISO 6429 as C0 and C1 As for the Windows-* encodings, Microsoft documentation treats bytes in this range as unassigned unless they are mapped to graphical characters, whereas Microsoft products return the underlying byte value in this case.) 5. IE handles KOI8-U as KOI8-RU, whereas Safari does the opposite. The former is probably more reasonable (assuming that letters are more important than line-drawing characters), but neither is actually correct given that the encodings are, strictly speaking, incompatible. This issue will of course look a bit different if it can be shown that documents containing the letter ?/? (only in KOI8-RU) are frequently mislabelled as KOI8-U. > Do you have input on the EUC-JP issue? Not yet, but you can expect some input on CJK encodings at some point in the future. -- ?istein E. Andersen
Received on Tuesday, 29 July 2008 15:55:25 UTC