- From: Kenneth Whistler <kenw@sybase.com>
- Date: Fri, 17 Dec 1999 17:52:18 -0800 (PST)
- To: ietf-charsets@iana.org
Resend. To: Harald@Alvestrand.no Subject: RE: Fwd: Last Call: UTF-16, an encoding of ISO 10646 to Proposed Cc: ietf_charsets@iana.org, kenw@birdie.sybase.com, mark.davis@us.ibm.com Harald, > > >Plane 2: SIP ~47000 Han characters (Vertical Extension B) under ballot > > ~18500 still assignable code points > > [haring off on an unrelated topic] > one thing I'm not connected enough to have found out: > Is this large set *new* characters, or are parts of the set used for > rolling back the unifications that created so much tension in and around > the IRG, for people who want that? > Well, it is a large set of *old* characters. ;-) These do not roll back the unifications, but instead constitute the third tier of really, really, obscure Chinese characters, after Vertical Extension A (the 6582 that just got added to the BMP for Unicode 3.0). The IRG is still observing the unification rules for this batch of 47,000. The largest part of these consist of all the other characters from the big source dictionaries (like the KangXi dictionary) that were not included in the first two sets. Most of them are truly obsolete characters with one or a few citations in old Chinese classics, or they constitute mistakes and variants that got into print (partly because of the long history of wood-block printing in China) and from there into the dictionary compendia. Many of the dictionary entries for these consist of one-liners: XX See YY. or XX Variant of YY. or XX Mistake for YY. etc. However, included in this batch are also a number of modern-use (and important) characters from other sources that didn't make it into the first two batches -- including a group of characters on the Hong Kong government's required list, additional corporate and national characters from Japan, China, Vietnam, etc. The total number of these is relatively small, but they are much more important for general implementation, since they have usage outside of the text entry of ancient Chinese (and other Han) classics. The issue of "compatibility characters" has raised its ugly head with the pending publication of JIS X 0213. A number of variants that are already unified in Unicode are separately encoded in JIS X 0213 as required Japanese national variants (to match some legal requirements for appearance of characters). This rather limited set was, in fact, originally proposed by the UTC to be included among the compatibility Han characters, but the issue was dropped during the development of the URO. But the set has come back, and UTC has accepted them as a group of compatibility characters (approximately 56 of them, if I remember correctly). Taiwan and China are likely to come in with an additional collection of compatibility characters to help in the crossmapping of CNS and GB standards. But all participants in this want to keep the repertoire of these small and manageable -- and all will be clearly marked as compatibility variants of specified unified characters. Nobody involved is, at this point, interested in starting a wholesale rollback of unification. Note that these compatibility variants are for the explicit variants and duplicates in existing Asian standards -- we are *not* talking here about coming up with a full set of encodings for "Chinese" Hanzi characters, and another full set of encodings for "Japanese" Kanji characters, and another full set of encodings for "Korean" Hanja characters. Everybody knows that that way lies madness. --Ken
Received on Friday, 17 December 1999 20:55:57 UTC