- From: Řistein E. Andersen <liszt@coq.no>
- Date: Thu, 22 Oct 2009 21:23:43 +0100
On 22 Oct 2009, at 17:15, NARUSE, Yui wrote: > First, JIS-X-0208 and JIS-X-0212 are not in IANA Charsets, I am not sure what you mean; they are both listed at <http://www.iana.org/assignments/character-sets>: Name: JIS_C6226-1983 [RFC1345,KXS2] MIBenum: 63 Source: ECMA registry Alias: iso-ir-87 Alias: x0208 Alias: JIS_X0208-1983 Alias: csISO87JISX0208 Name: JIS_X0212-1990 [RFC1345,KXS2] MIBenum: 98 Source: ECMA registry Alias: x0212 Alias: iso-ir-159 Alias: csISO159JISX02121990 > moreover those correct names as spec are JIS X 0208 and JIS X 0212. (The IANA registry is internally inconsistent and often disagrees with official standards when it comes to capitalisation, dashes/hyphens, underscores and spaces, so it is difficult to get this right. Please excuse me for not always paying due attention to such details in e- mails. Of course, the specifications should follow either IANA or the official standard as appropriate, depending on what it is referring to.) > Second, JIS_C6226-1983, JIS_X0212-1990, and EBCDICs are not > ASCII compatible. So they are out of discouraged; mustn't use. EBCDIC is clearly not ASCII-compatible and may be unique amongst the character sets in the IANA registry in providing the full ASCII repertoire in a different arrangement. JIS_C6226-1983 and JIS_X0212-1990 as defined in RFC1345 (i.e., on their own) do not contain basic ASCII characters at all, so it makes little sense to use them for HTML documents without adding ASCII or the ASCII-based JIS C 6220-1969, which would give something like EUC- JP or ISO-2022-JP. JIS_C6226-1983 contains wide versions of ASCII characters, but those are not interpreted as HTML mark-up (unless I am mistaken). JIS_X0212-1990 does not contain ASCII, kana or basic kanji, so it is of extremely limited usefulness on its own even in a plain- text setting. Warning against completely useless encodings seems pointless. Many other encodings in the IANA registry are ASCII-incompatible in different ways; what I do not understand is what makes the ones currently mentioned in the HTML5 draft particularly harmful. > Finally, Why ISO 2022 series is discouraged is not clear. We agree on this point. > Anyway, most of charsets defined RFC 1345 are not clear. > Conversion table between [those charsets and] Unicode is needed. Quite. Anne van Kesteren, I and several others are currently trying to document how browsers handle different encodings at <http://wiki.whatwg.org/wiki/Web_Encodings>, and defining mappings to Unicode is one of the goals. Your contribution would be much appreciated. -- ?istein E. Andersen
Received on Thursday, 22 October 2009 13:23:43 UTC