JLReq TF meeting notes 2020·12·01

Notes from JLReq TF zoom meeting on 2020·12·01

Attendee
Kobayashi Tatsuo (Kobayashi-san)
Kobayashi Toshi (Bin-sensei)
Murata Makoto
Nat McCully
Shimono Atsushi
Tajima Jun
Kida Yasuo
(Anybody missing?)

The context
Internationalization of JLReq character class, a part of the future major update of JLReq. The JLReq character class has many characters that appear in multiple classes. In other words these characters have multiple behaviors depending on the context they appear. Bin-sensei proposed to convert "cl-20", "cl-21", "cl-22", "cl-23", "cl-24", "cl-25", "cl-28", "cl-29", "cl-30” into virtual classes such as “reference marks” etc. following Eric Muller’s model, and to remove cl-24 Grouped numbers (c.f. meeting notes 2020·10·20).

After removing these classes there are still characters that belong to multiple classes. All such cases are between cl-27 Western character. It means probably they behave differently when they are used in Japanese text and Latin script context. At the meeting we went through these characters to better understand the nature of the context that separates the behavior of a character.


Meeting notes
• Probably the most important outcome of the meeting is that we recognized cases where single character exhibits two different behaviours not because of the context they are used but because there are intrinsic differences between two cases. It might, or might not, suggest that they are two different characters. Examples are U+2019, U+201D, U+2010, U+2013 (these are cl-02/03 and cl-27 multiples), U+2014, U+2025, U+2026 (cl-08 - cl-27). The team will further investigate on this point.
• A similar issue is that U+301C WAVE DASH is more and more commonly used in Japanese text as an alternate of U+30FC KATAKANA-HIRAGANA PROLONGED SOUND MARK. If it is a prolonged sound mark its expected layout behaviour is different from one of U+301C.
• We have not review the proposal from Eric but the discussion touched on one point. The proposal is differentiating the spacing behaviour depending on the text direction. Bin-sensei said most cases the behaviour depends on if the part of the text is Japanese or Latin and not about if the direction is vertical or horizontal.
• The team agreed to remove U+00AB and U+00BB from cl-01/02 as they are not used in Japanese text context. They will belong only to cl-27.
• The team agreed to remove cl-12 and cl-13 because they have special behavior only when they are used with cl-24 Grouped numbers which is to be removed. Characters in these classes will be assigned to either cl-17 Kanji or cl-27 Western characters depending on their UAX50 behaviour (i.e. U/Tu vs R).
• We’ve gone through upto cl-14. The next meeting will review them from cl-15.
• Should U+4EDD 仝 be added to cl-09 Iteration marks?  No, it behave the same way as Kanji, that is it can come at the beginning of a line.


Questions / comments welcome.

The next meeting will be two weeks later on 12/15 10 am JST.

- kida

Received on Thursday, 3 December 2020 09:54:42 UTC