- From: Martin J. Dürst <mduerst@ifi.unizh.ch>
- Date: Mon, 01 Sep 1997 14:43:43 +0200 (MET DST)
- To: Kohji SHIBANO <shibano@tiu.ac.jp>
- Cc: ietf-charsets@INNOSOFT.COM, Harald.T.Alvestrand@uninett.no, Masataka Ohta <mohta@necom830.hpcl.titech.ac.jp>, jcs@tiu.ac.jp
On Mon, 1 Sep 1997, Kohji SHIBANO wrote: > Martin, > > As chariman of ISO SC2 and JIS X 0208 committee, I would like to give you so > me information. Many thanks for your very valuable information. > At 7:46 PM 97.8.30, Martin J. D$@|r(Jst wrote: > > In the BOF, I commented on this. I said that these were indeed > > mostly character components that turned up in many characters, > > and that a high percentage of them was explicitly unified by > > the new version of the base Japanese Kanji standard, > > JIS X 0208:1997. I mentionned a figure of something like 90 > > or 95%, which turns out to be too high if one counts cases, > > but probably correct if one counts the characters affected > > (see below). > > > Since we do not have sufficient information to identify each Kanji from Chin > a, Taiwan, and Korea, it is very difficult to compare 10646 unification rule > s based with JIS X 0208:1997 unification rules and to evaluate compatibility > between the two rules. At the time CJK-JRG did the unification, Japan also > could not provides sufficient indentification information on each Kanji. JIS X 0208:1997 has indeed done extensive and excellent work in the area of Kanji identification, and has set an example that will hopefully be followed by other standards. > I do not know the availability of some of GB standards. For example, Dr. Yas > uoka of Kyoto University anvailed mystery behind GB standards. I have heard a lot from Dr. Yasuoka and others about GB standards. There is definitely room for improvement. > As far as I understand, CJK-JRG work only used 24 dots fonts that is not suf > ficient for real unification consideration. Real consideration of unificatio > n rules requires identification information and very high quality Kanji shap > e information. It depends on whether you use 24 dots fonts for initial identification only, and the identifiers use their knowledge and other sources in case of doubts, or whether you work blindly on 24 dots. The later is definitely unacceptable; the former, which is more or less what I think JRG has done, is an attempt to do their best with limited means. In some cases, I think they had handwritten examples instead of 24 dot patterns, for examlpe for the secondary Korean standard, KS C 5657-1991. These have advantages and disadvantages when compared to 24 dots, but probably more advantages than disadvantages. In some sense, many character standard makers are in a bootstrap process, with all the limitations this involves. > > In the case that I have missed Masataka Ohta's name somewhere > > in JIS X 208:1997, I would like him to give us the exact page, > > and if necessary line number, to verify. In the case he has > > indeed participated, but has for some reason be forgotten, > > I ask the chair of both commitees listed on page 399, Prof. > > Shibano, to tell us how Masataka Ohta has been involved. > > > Masataka Ohta is really the member of JIS X 0208 committee and recorded as a > member of ***WG2*** found in the middle of page 399 of JIS X 0208:1997, 6 li > nes below my name. > > However, he is not officially representing JIS committee and most of his opi > nions and interpretations contradics committee positions. Many thanks for making this very important correction. I sincerely appologize to Masataka Ohta for my mistake. I should have checked better! > > Now for the list that Harald has shown. This list has 8 lines, > > with four groups that each contain 2 or three variants. > > For these, I give the item number of Section 6.6.3.2 of JIS > > X 208:1997 (p. 12,...) which gives examlpes of unification, > > and comments if necessary. > > > The list is not an example but normative rules of unification. Yes, it is indeed normative. But it is important to understand exactly in what sense it is normative. This can be found in Section 6.6.1. There, it says that the unifications in 6.6.3 and 6.6.4 only define the correspondence between the codepoints (bit combinations) of the standard and Kanji shapes that are in general use, and that it does not define any guideline for Kanji shapes in general. Also, it says (comment 1) that this normative unification is based on the Mincho font style, and that it is not intended to limit the use of other fonts, or to give any kind of guideline on fonts. The rules are therefore indeed normative, but not in the general sense that just saying so might imply. That was my reason for using the term "examples", but I agree that this is not very clear. The way I understand it is that the commitee looked only at Mincho and only at existing shapes in widely available fonts. Making guidelines for what *would* be unified if it existed was defined out of scope. But if a font designer went and designed a new Mincho font, and decided that in some details, he wanted to deviate from the area of unification as given in JIS 208:1997, for artistic or whatever reasons, he/she would, as far as my interpretation goes, be allowed to do so. It would just be his/her responsibility; unification wouldn't be guaranteed by the stanard, but it would also not be outrighly rejected, given of course that the other rules of the standard (no two codepoints have the same shape, characters can be identified with the information given (starting on p. 63),...). As an example, JIS X 0208:1997 forbids the (mechanical) combination of two or more unification rules to deviate more and more from the given examlpe shapes and from what can be found in existing fonts. However, JIS 208 wouldn't forbid the same shape if it appeared as the result of the font design process, with all its aestetic and other considerations. It is of course difficult to tell the difference between mechanical extensions and human aestetic creativity. But that can never be done mechanically anyway, and I think the standard does a very good job to make this as clear as it is ever possible. If this interpretation is not correct, then please tell me where I made a mistake. The above also explains some of the differences between JIS X 0208:1997 and ISO/IEC 10646 that I was listing in my mail. Some of them are due to the fact that these shape variants just don't exist in Japanese fonts, and are therefore not documented is JIS 208. However, they would rather nicely fit the general guidelines for unification. It's kind of like the English had a special way to write A's, which wouldn't be listed in a German standard, because it's not popular in Germany, but which no German would have difficulties to read as an A. > JIS Kanji Dictionary, which will be published in November, Very interesting to hear about your JIS Kanji Dictionary. I look forward to buying it and studying it. Will it be sold by JSA, or by some other publisher? Line 1: > > case 2 (3 variants) 161 (2 variants, third is > > the single-character > > shape which is not listed > > in JIS 208 section 6.6.3.2) > > Basicaly, this is an error of the first edition of JIS X 0208. This rule is > basically for compatibility purpose. I am not sure about this. The two shapes are frequently used variants of the same radical. JIS X 0208-1990 (the predecessor of the standard we are discussing here) lists this case already, as "difference resulting from simplification of drawing sequence". In today's Mincho fonts, the difference is indeed on the edge of what one would unify in general, but when one thinks about how these two variants are drawn by hand, and that they are used inter- changably for the same radical, unification seems to make a lot of sense. Maybe you can give further explanations? > > case 3 (3 variants) 153 (JIS 208 lists one more variant) > > This rule come from well known Kanji shape design error of Kangxi dictionary. Many of today's shapes could be attributed as design errors that happened in the change from the seal script to newer scripts. After a certain time, an error is not anymore seen as an error. It is of course always difficult to say when that happens. For examlpe, one could easily say that unification case 100 was an error of Japanese postwar Kanji simplification, because the two shapes are historically separate, and there are Kanji pairs (not very frequent, though) where this difference is the only and crucial difference. But for present-day Japanese, this is history, too. > > With all the comments, it's difficult to exactly say what percentage > > this would amount to. But counting each case as one item, it's around > > 66%. If one counts characters affected, and not cases as such, however, > > the percentage is much higher, because the cases with the most characters > > (line 1: case 1, 2, 4; line 8: case 4) all are included in JIS 208. > > > So far as I understand, CJK-JRG without sufficient information on each Kanji > and its shape, they did a good job. Even though they based on explanatory pa > ges of JIS X 0208:1990, ISO/IEC 10646-1 has better specification of Unificat > ion than JIS X 0208:1990. Yes. I think they had to make it more explicit because they had more characters, in more varieties of shapes, and in many cases with less usage frequency and less documentation or common knowledge available. Many thanks again for your valuable comments, With kind regards, Martin. --Boundary (ID uEbHHWxWEwCKT9wM3evJ5w)
Received on Monday, 1 September 1997 05:47:36 UTC