JLreq meeting notes 2020-10-20

JLReq meeting notes 2020-10-20

Venue: zoom
Attendees: Bin-sensei, Kobayashi-san, Murata-san, Shimono-san & Kida
Agenda: Context dependent character classes
The top level issue is to expand JLReq character classes to Unicode. Further background at the end of the mail.
https://github.com/w3c/jlreq/issues/240 <https://github.com/w3c/jlreq/issues/240>


Notes:
We went over a proposal from Bin-sensei regarding removal of context dependent character classes in JLReq. The following classes are context dependent:
Characters as reference marks (cl-20) 合印中の文字(cl-20)
Ornamented character complexes (cl-21) 親文字群中の文字(添え字付き)(cl-21) 
Simple-ruby character complexes (cl-22) 親文字群中の文字(熟語ルビ以外のルビ付き)(cl-22) 
Jukugo-ruby character complexes (cl-23) 親文字群中の文字(熟語ルビ付き)(cl-23) 
Grouped numerals (cl-24) 連数字中の文字(cl-24)
Unit symbols (cl-25) 単位記号中の文字(cl-25) 
Warichu opening brackets (cl-28) 割注始め括弧類(cl-28) 
Warichu closing brackets (cl-29) 割注終わり括弧類(cl-29) 
Characters in tate-chu-yoko (cl-30) 縦中横中の文字(cl-30)


Discussions
cl-24 “Grouped numerals” can be safely removed as it is a legacy half-width Arabic numbers used in an early era of computerized layout systems and not in use anymore. The regular numbers (U+0030…) are covered by cl-27 “Western characters”.
All other character classes listed above can be removed by defining their behaviour as a part of the descriptions of related features. Some rules are already written in such a way (e.g. ruby). All these features require specific tagging and therefore the behaviour would belong to these tags rather than to the characters.
cl-25 “unit symbols” and cl-21 “ornamented character complexes” are related to features that are not native to Japanese. In general such features should probably be covered by layout rules for other languages/scripts or of specific domains like math. There might be other such cases in JLReq. Need further investigation / discussions.
Tate-chu-yoko (cl-30) block can be treated just like single Kanji in terms of spacing and line break opportunities. It can be inferred if you read JLReq carefully but adding such a wording would make it clearer.
The description of the behaviour of Kanji numeric is scattered in JLReq. It would make sense to add a subsection to describe it in one place.
ToDos:
Nat to write up a note describing issues (or discrepancy it would cause) by implementing JLReq layout in a globalized layout engine.
Kida to translated the note from Eric Muller regarding expanding JLReq character classes to Unicode.
I thought there was a todo for Bin-sensei but I forgot…

The next meeting will be in two weeks. Kida to setup.

––––––––––––––
The background or the context:

One of the essential tasks to bring JLReq to the future is globalization. What does it mean? Like any other layout rules in the world, Japanese line layout rules had evolved within its own market. Inherently it is built with assumptions that are true within its market. It has its own universe and in that sense the JLReq is similar to JIS character set.

Some of these assumptions break when it is brought into a multi-language environment. These gaps on one side would make it harder to implement or result in additional cost. On the flip side it would create gaps betwen what is implemented and what is written, and it would eventually hinder evolution of the Japanese layout.
––––––––––––––––

 - kida

Received on Tuesday, 20 October 2020 16:51:14 UTC