- From: Martin J. Dürst <mduerst@ifi.unizh.ch>
- Date: Sat, 30 Aug 1997 19:46:08 +0200 (MET DST)
- To: ietf-charsets@INNOSOFT.COM
- Cc: Harald.T.Alvestrand@uninett.no, Kohji SHIBANO <shibano@tiu.ac.jp>, Masataka Ohta <mohta@necom830.hpcl.titech.ac.jp>
Hello everybody, In the charset policy BOF at the recent IETF meeting in Munich, chaired by Harald Alvestrand, he showed a slide with variants of Han characters (Kanji) that are unified in Unicode/ISO 10646, but which may be problematic. He also showed this list in his plenary talk presenting the planned IETF charset policy. This list has been published on page 885 (explanatory page 7), bottom, of JIS X 0221-1995, the Japanese translation of ISO 10646 (explanatory material not contained in the original), and probably elsewhere. In the BOF, I commented on this. I said that these were indeed mostly character components that turned up in many characters, and that a high percentage of them was explicitly unified by the new version of the base Japanese Kanji standard, JIS X 0208:1997. I mentionned a figure of something like 90 or 95%, which turns out to be too high if one counts cases, but probably correct if one counts the characters affected (see below). To this, Masataka Ohta strongly protested, saying something to the effect that he had been on the commitee developping that standard. I have now had time to look at JIS X 208:1997 again. On page 399 (explanatory page 25), it lists the members of the two commitees involved. On the following page, it gives additional acknowledgements. Whatever that may mean, I have not been able to find the name Masataka Ohta on these pages. [my name turns up at the end of the text on page 400, as one of the contributors to the public review done by the commitee, in the form Duerst, Martin J.] In the case that I have missed Masataka Ohta's name somewhere in JIS X 208:1997, I would like him to give us the exact page, and if necessary line number, to verify. In the case he has indeed participated, but has for some reason be forgotten, I ask the chair of both commitees listed on page 399, Prof. Shibano, to tell us how Masataka Ohta has been involved. Now for the list that Harald has shown. This list has 8 lines, with four groups that each contain 2 or three variants. For these, I give the item number of Section 6.6.3.2 of JIS X 208:1997 (p. 12,...) which gives examlpes of unification, and comments if necessary. Note that JIS 208 also contains and lists exceptions, but that these are carried over to Unicode/ISO 10646 as being separated by the source separation rule. Line 1 case 1 (3 variants) 128 (2 variants, third is handwriting and not covered by JIS 208) case 2 (3 variants) 161 (2 variants, third is the single-character shape which is not listed in JIS 208 section 6.6.3.2) case 3 (3 variants) 153 (JIS 208 lists one more variant) case 4 (3 variants) 155 (2 variants, middle is the single-character shape which is not listed in JIS 208 section 6.6.3.2) Line 2 case 1 (2 variants) 141 case 2 (2 variants) 147 case 3 (2 variants) 150 case 4 (2 variants) 70 (JIS generalizes to the lower part) Line 3 case 1 (2 variants) 146 case 2 (2 variants) 98 case 3 (2 variants) 94 case 4 (2 variants) 144 (JIS limits this to the case where this part appears on the right) Line 4 case 1 (3 variants) - (similar cases listed in 6.6.4) case 2 (2 variants) 167 (JIS generalizes to the upper part) case 3 (2 variants) 136 (JIS generalizes to the lower part) case 4 (2 variants) 125 Line 5 case 1 (2 variants) 124 (JIS generalizes to the lower part) case 2 (2 variants) 97 case 3 (3 variants) 96 case 4 (2 variants) - Line 6 case 1 (2 variants) - case 2 (2 variants) - case 3 (2 variants) - case 4 (3 variants) 48 (two right variants only in JIS) Line 7 case 1 (3 variants) - (not a general case in JIS, but several cases where this is unified listed in 6.6.4) case 2 (2 variants) - case 3 (2 variants) - case 4 (3 variants) 101 Line 8 case 1 (2 variants) 113 case 2 (2 variants) - case 3 (2 variants) 80 case 4 (2 variants) 82 With all the comments, it's difficult to exactly say what percentage this would amount to. But counting each case as one item, it's around 66%. If one counts characters affected, and not cases as such, however, the percentage is much higher, because the cases with the most characters (line 1: case 1, 2, 4; line 8: case 4) all are included in JIS 208. With kind regards, Martin. ---- Dr.sc. Martin J. Du"rst ' , . p y f g c R l / = Institut fu"r Informatik a o e U i D h T n S - der Universita"t Zu"rich ; q j k x b m w v z Winterthurerstrasse 190 (the Dvorak keyboard) CH-8057 Zu"rich-Irchel NEW TEL: +41 1 63 543 16 S w i t z e r l a n d NEW FAX: +41 1 63 568 09 Email: mduerst@ifi.unizh.ch ---- --Boundary (ID uEbHHWxWEwCKT9wM3evJ5w)
Received on Saturday, 30 August 1997 10:47:52 UTC