Re: For the record from Martin J. Dürst on 1997-09-01 (ietf-charsets@w3.org from July to September 1997)

From: Martin J. Dürst <mduerst@ifi.unizh.ch>
Date: Mon, 01 Sep 1997 14:43:43 +0200 (MET DST)
To: Kohji SHIBANO <shibano@tiu.ac.jp>
Cc: ietf-charsets@INNOSOFT.COM, Harald.T.Alvestrand@uninett.no, Masataka Ohta <mohta@necom830.hpcl.titech.ac.jp>, jcs@tiu.ac.jp
Message-id: <Pine.SUN.3.96.970901123505.12354C-100000@enoshima>
On Mon, 1 Sep 1997, Kohji SHIBANO wrote:

> Martin,
> 
> As chariman of ISO SC2 and JIS X 0208 committee, I would like to give you so
> me information.

Many thanks for your very valuable information.


> At 7:46 PM 97.8.30, Martin J. D$@|r(Jst wrote:

> > In the BOF, I commented on this. I said that these were indeed
> > mostly character components that turned up in many characters,
> > and that a high percentage of them was explicitly unified by
> > the new version of the base Japanese Kanji standard,
> > JIS X 0208:1997. I mentionned a figure of something like 90
> > or 95%, which turns out to be too high if one counts cases,
> > but probably correct if one counts the characters affected
> > (see below).
> >
> Since we do not have sufficient information to identify each Kanji from Chin
> a, Taiwan, and Korea, it is very difficult to compare 10646 unification rule
> s based with JIS X 0208:1997 unification rules and to evaluate compatibility
> between the  two rules. At the time CJK-JRG did the unification, Japan also
> could not provides sufficient indentification information on each Kanji.

JIS X 0208:1997 has indeed done extensive and excellent work in
the area of Kanji identification, and has set an example that will
hopefully be followed by other standards.


> I do not know the availability of some of GB standards. For example, Dr. Yas
> uoka of Kyoto University anvailed mystery behind GB standards.

I have heard a lot from Dr. Yasuoka and others about GB standards.
There is definitely room for improvement.


> As far as I understand, CJK-JRG work only used 24 dots fonts that is not suf
> ficient for real unification consideration. Real consideration of unificatio
> n rules requires identification information and very high quality Kanji shap
> e information.

It depends on whether you use 24 dots fonts for initial identification
only, and the identifiers use their knowledge and other sources in case
of doubts, or whether you work blindly on 24 dots. The later is definitely
unacceptable; the former, which is more or less what I think JRG has
done, is an attempt to do their best with limited means. In some cases,
I think they had handwritten examples instead of 24 dot patterns,
for examlpe for the secondary Korean standard, KS C 5657-1991.
These have advantages and disadvantages when compared to 24 dots,
but probably more advantages than disadvantages.
In some sense, many character standard makers are in a bootstrap process,
with all the limitations this involves.


> > In the case that I have missed Masataka Ohta's name somewhere
> > in JIS X 208:1997, I would like him to give us the exact page,
> > and if necessary line number, to verify. In the case he has
> > indeed participated, but has for some reason be forgotten,
> > I ask the chair of both commitees listed on page 399, Prof.
> > Shibano, to tell us how Masataka Ohta has been involved.
> >
> Masataka Ohta is really the member of JIS X 0208 committee and recorded as a
> member of ***WG2*** found in the middle of page 399 of JIS X 0208:1997, 6 li
> nes below my name.
> 
> However, he is not officially representing JIS committee and most of his opi
> nions and interpretations contradics committee positions.

Many thanks for making this very important correction. I sincerely
appologize to Masataka Ohta for my mistake. I should have checked
better!


> > Now for the list that Harald has shown. This list has 8 lines,
> > with four groups that each contain 2 or three variants.
> > For these, I give the item number of Section 6.6.3.2 of JIS
> > X 208:1997 (p. 12,...) which gives examlpes of unification,
> > and comments if necessary.
> >
> The list is not an example but normative rules of unification.

Yes, it is indeed normative. But it is important to understand
exactly in what sense it is normative. This can be found in
Section 6.6.1. There, it says that the unifications in 6.6.3
and 6.6.4 only define the correspondence between the codepoints
(bit combinations) of the standard and Kanji shapes that are
in general use, and that it does not define any guideline for
Kanji shapes in general.
Also, it says (comment 1) that this normative unification is
based on the Mincho font style, and that it is not intended
to limit the use of other fonts, or to give any kind of
guideline on fonts.

The rules are therefore indeed normative, but not in the general
sense that just saying so might imply. That was my reason
for using the term "examples", but I agree that this is not
very clear.

The way I understand it is that the commitee looked only at
Mincho and only at existing shapes in widely available fonts.
Making guidelines for what *would* be unified if it existed
was defined out of scope.

But if a font designer went and designed a new Mincho
font, and decided that in some details, he wanted to deviate
from the area of unification as given in JIS 208:1997, for
artistic or whatever reasons, he/she would, as far as my
interpretation goes, be allowed to do so. It would just be
his/her responsibility; unification wouldn't be guaranteed
by the stanard, but it would also not be outrighly rejected,
given of course that the other rules of the standard (no two
codepoints have the same shape, characters can be identified
with the information given (starting on p. 63),...).

As an example, JIS X 0208:1997 forbids the (mechanical)
combination of two or more unification rules to deviate
more and more from the given examlpe shapes and from
what can be found in existing fonts. However, JIS 208
wouldn't forbid the same shape if it appeared as the
result of the font design process, with all its aestetic
and other considerations.

It is of course difficult to tell the difference between
mechanical extensions and human aestetic creativity. But
that can never be done mechanically anyway, and I think
the standard does a very good job to make this as clear
as it is ever possible.

If this interpretation is not correct, then please tell
me where I made a mistake.


The above also explains some of the differences between JIS
X 0208:1997 and ISO/IEC 10646 that I was listing in my mail.
Some of them are due to the fact that these shape variants
just don't exist in Japanese fonts, and are therefore
not documented is JIS 208. However, they would rather
nicely fit the general guidelines for unification.
It's kind of like the English had a special way to
write A's, which wouldn't be listed in a German standard,
because it's not popular in Germany, but which no German
would have difficulties to read as an A.


> JIS Kanji Dictionary, which will be published in November,

Very interesting to hear about your JIS Kanji Dictionary. I look
forward to buying it and studying it. Will it be sold by JSA, or
by some other publisher?


Line 1:
> > 	case 2 (3 variants)	161 (2 variants, third is
> > 					the single-character
> > 					shape which is not listed
> > 					in JIS 208 section 6.6.3.2)
> 
> Basicaly, this is an error of the first edition of JIS X 0208. This rule is
> basically for compatibility purpose.

I am not sure about this. The two shapes are frequently used variants
of the same radical. JIS X 0208-1990 (the predecessor of the standard
we are discussing here) lists this case already, as "difference
resulting from simplification of drawing sequence". In today's
Mincho fonts, the difference is indeed on the edge of what one
would unify in general, but when one thinks about how these
two variants are drawn by hand, and that they are used inter-
changably for the same radical, unification seems to make a
lot of sense.
Maybe you can give further explanations?


> > 	case 3 (3 variants)	153 (JIS 208 lists one more variant)
> 
> This rule come from well known Kanji shape design error of Kangxi dictionary.

Many of today's shapes could be attributed as design errors that
happened in the change from the seal script to newer scripts.
After a certain time, an error is not anymore seen as an error.
It is of course always difficult to say when that happens.

For examlpe, one could easily say that unification case 100 was
an error of Japanese postwar Kanji simplification, because the
two shapes are historically separate, and there are Kanji pairs
(not very frequent, though) where this difference is the only
and crucial difference. But for present-day Japanese, this is
history, too.


> > With all the comments, it's difficult to exactly say what percentage
> > this would amount to. But counting each case as one item, it's around
> > 66%. If one counts characters affected, and not cases as such, however,
> > the percentage is much higher, because the cases with the most characters
> > (line 1: case 1, 2, 4; line 8: case 4) all are included in JIS 208.
> >
> So far as I understand, CJK-JRG without sufficient information on each Kanji
> and its shape, they did a good job. Even though they based on explanatory pa
> ges of JIS X 0208:1990, ISO/IEC 10646-1 has better specification of Unificat
> ion than JIS X 0208:1990.

Yes. I think they had to make it more explicit because they
had more characters, in more varieties of shapes, and in many
cases with less usage frequency and less documentation or common
knowledge available.


Many thanks again for your valuable comments,

With kind regards,	Martin.


--Boundary (ID uEbHHWxWEwCKT9wM3evJ5w)
Received on Monday, 1 September 1997 05:47:36 UTC