Re: 異体字の使用例 from Taro Yamamoto on 2024-06-02 (public-i18n-japanese@w3.org from April to June 2024)

From: Taro Yamamoto <tyamamot@adobe.com>
Date: Sun, 2 Jun 2024 00:27:53 +0000
To: Nat McCully <nmccully@adobe.com>, 木田泰夫 <kida@mac.com>, Kobayashi Toshi <binn@k.email.ne.jp>
CC: JLReq TF 日本語 <public-i18n-japanese@w3.org>
Message-ID: <DM8PR02MB8070B3D765E5EF6C5867C3EDCEFD2@DM8PR02MB8070.namprd02.prod.outlook.com>
Nat wrote:

  *
I actually disagree here with limiting our documentation to what is possible only with plain text encoding. The reality is that today to support Japanese text composition fully and to respect the specific character/glyph choices of the author faithfully we must include the tangled and confused world of fonts. Fonts are, today anyway, a mess. Unicode is inadequate. These considerations are largely unknown to most, and thus bad decisions continue to be made by standards bodies and by implementers because of such obscurity.

Because there are two "characters" 高 (U+9AD8) and 髙 (U+9AD9) for example, yes, I have to agree that we need to discuss the 異体字 glyph variant issue to some extent. In addition, use of IVSes is also possible in writing in plain text, we can discuss why we need glyph variants and IVSes, and how we should use them. But how farther can we go beyond that level?


  *
I also think that although a character is a character is a character to some . . .

I believe the phrase clearly explains the essential meaning of Han Unification.


  *
I believe for implementers we must empower authors and users to freely set the exact character design if they wish, and when it is not encoded or the font they wish to use only partially implemented or made so that only by using gid can they get what they want, we support them.

The encoded kanji ideographic characters that were originally defined by Japanese domestic standards (such as JIS X 0208, JIS X 0212 and JIS X 0213) and the glyphs used to print their prototypical characters (including the JIS90 and JIS2004 glyph shapes) include 異体字, most of which can be distinguished by IVSes, and all of these are covered by the Adobe-Japan1 glyph set and its IVD collection (including 14,664 kanji glyphs). In addition, the character/glyph set of 文字情報基盤 (MJ) has 58,862 characters/glyphs. Fonts supporting some or all of these character/glyph sets have already been released and are widely available, and these can be said to be standardized well to a degree. It may be easy to explain all of these in relation to the 異体字 issue. (BTW, the government is now planning to add approximately 10,000 more kanji characters/glyphs to the MJ character set, though).

However many characters and glyphs become available, they cannot satisfy everyone. I can agree with the idea that some method to allow the user to directly specify and access unencoded glyphs in a font should be given so that any kinds of digital documents including old or classical texts can be made and faithfully displayed and reproduced.

But there can be kanji characters and glyphs (and symbol glyphs also) that are not standardized, not limited to kanji ideographs: logotypes, PI glyphs, 外字 and colored glyphs etc., which need to be accessed by means of font selection or with a sophisticated access method provided by some system/application software.

Should we discuss all of these, because these are all relevant to the issue of character/font selection in creating digital documents?
If everyone thinks so, I can agree!

Regards,

--Taro

________________________________
From: Taro Yamamoto <tyamamot@adobe.com>
Sent: Saturday, June 1, 2024 8:13:37 AM
To: 木田泰夫 <kida@mac.com>; Kobayashi Toshi <binn@k.email.ne.jp>
Cc: JLReq TF 日本語 <public-i18n-japanese@w3.org>
Subject: Re: 異体字の使用例

異体字について
少しコメントさせてください。

基本的にUnicodeで符号化されているデジタルテキストを作成することに限定して書いた方が良いと思います。
それは、異体字にどのような種別があるにせよ、符号化されれば文字（Character）なのであり、独立した符号位置が与えられない異体字は、文字としては存在しえず、そのグリフに文字コードから指定する方法はIVSを用いる以外にはありません。また複数の異体字のそれぞれに独立した符号位置が与えられた場合では、基底文字だけを用いてその文字を指定する場合、どちらも独立した文字として指定されるわけです。であれば、異体字そのものについて細かく細分化して説明する必要はなく、「文字コードによっては区別できない異体字にはIVSを用いてアクセスできる場合がある」、そして「IVSによっても区別できない字形差は通常、デザイン差とみなされ、その異体字のデザインをもつフォントを選択する必要がある」という程度に記述をとどめるのが良いのではないでしょうか。

私見まで。

山本太郎

________________________________
差出人: 木田泰夫 <kida@mac.com>
送信日時: 2024年6月1日 21:57
宛先: Kobayashi Toshi <binn@k.email.ne.jp>
CC: JLReq TF 日本語 <public-i18n-japanese@w3.org>
件名: Re: 異体字の使用例

EXTERNAL: Use caution when clicking on links or opening attachments.


敏先生、

なるほど。おっしゃる通り、例えばjlreq-dの表記をJIS X 0213:2004や表外漢字字体表に合わせると決めた場合、繫 vs 繋のようにコードポイントが異なるゆえに注意すべき漢字がありますね。

JIS2004で字形の変わった文字のほとんどは同一コードポイントで、そちらはプラットフォームや見る人のフォントの設定で変わってしまうので、文書側でできることは中途半端でしかないんですが。とは言えほぼ全てのプラットフォームは今やJIS2004でしょうから、JIS2004に合わせるのが良いんでしょうかね。これは後処理で一気に変えられるので、jlreq-dをどうするかはおいおい考えれば良いと思います。

それとは別に、この問題をjlreq-dで説明するかどうかを決める必要がありますね。入れた方がいいのかな。私はいい加減なので、そんな統一にこだわらなくても、一つの文書の中で字形がバラバラでも、そんなに気にする必要もないのでは、と思ってしまいます。が、一応知識としては持ってもらう、んですかね。それを説明するとすると、異体字とはなんぞや、を説明する必要が出てきますね。

木田

> 2024/06/01 19:45、Kobayashi Toshi <binn@k.email.ne.jp>のメール:
>
> 木田　様
> みなさま
>
> 　小林　敏　です．
>
>  Kobayashi Toshi　wrote
>
>> そして，常用漢字としての問題，表外漢字の字体の扱いという点では，印刷に限られた問題ではないと思います．
>>
>> 以下のような例は，前の字体にするか，後ろの字体を使用するか，Webでも両方を見かけますので，Webで字体が複数ある例は結構あるように思います．
>> 　常用漢字の例　曾・曽　瘦・痩 麵・麺 頰・頬 塡・填 葛・葛
>> 　表外漢字の例　摑・掴　噓・嘘　繫・繋　藪・薮　鶯・鴬　壺・壷　攪・撹　賤・賎　諫・諌　頸・頚　嚙・噛　瀆・涜
>
> このことを別に表現するなら，以下のようになります．
>
> “常用漢字表”に従って（字種，字体，音訓）表現するとした場合，2010年までは，固有名詞を除外し，字体（字体を選択しないといけない）の“問題はない”といってもよいでしょう．（デザイン差という問題は残りますが，問題となるようなことは実際にもなかったと思います．）
>
> しかし，2010年の“常用漢字表”の改正以降は，字種が“常用漢字表”の範囲内であっても，どの字体を選択するかの問題は，紙版であろうが，Webであろうが，例として上に掲げた漢字を使う場合，どうするかの問題が出てくるということです．
>
> ましてや表外漢字を使用する場合は，2010年以前から，該当する漢字を使用すれば，字体を選択しないといけない問題は出てきたのです．ただ，書籍の場合は，過去のいきさつがあり，そうした問題を認識していたので，ことさら問題になった，ということでしょう（当然，問題としない出版社もあった）．デジタルテキストの世界では，あまり，そのことを問題にする人がいなかったということだと思います．
>
> 例えば，木田さんの書かれた“2. 日本語デジタルテキストの作り方”（以前のバージョン）に，“繋げて”という文字が使用されていた．最終的に公開する際には，たぶん，“つなげて”か，“繫げて”の方がいいんじゃない，といったかもしれない，ということです．
>
> つまり，“常用漢字表”に含まれている漢字あるいは，表外漢字を使用する場合，使用する漢字によっては，字体の選択が必要となる漢字が出てくるよ，ということです．
>
> なお，私が書くなら，上に掲げた例でいえば，前に掲げた字体を使用すると思いますが，後ろの字体を使用することは否定はしません．それは，様々な事情はあるので，それはそれということかと思います（“常用漢字表”でも，言い方は別ですが，そのように言っています）．
>
Received on Sunday, 2 June 2024 00:28:05 UTC