Re: [jlreq] [META] Reorganize character classes and its adoption of Unicode based definition (#240) from Nat McCully via GitHub on 2020-10-11 (public-i18n-archive@w3.org from October to December 2020)

From: Nat McCully via GitHub <sysbot+gh@w3.org>
Date: Sun, 11 Oct 2020 02:16:16 +0000
To: public-i18n-archive@w3.org
Message-ID: <issue_comment.created-706638875-1602382575-sysbot+gh@w3.org>

One issue I see with the idea of adopting Unicode Character Property was a descriptor for use in JLReq:

JLReq mojikumi classes (and JIS X 4051 mojikumi classes) are a grouping of character according to spacing convention and the need to differentiate spacing rules among characters that are the same semantic type, e.g. （ and 「. Both those characters are Opening Punctuation, but the spacing rules differ, so in mojikumi classifications they are distinct. I am not sure if the intent of this proposal is to introduce such granularity into the Unicode Character Property just for the sake of supporting Japanese publishing spacing rules, but if not, then I think conversion to using them in JLReq will be a lossy conversion. Unicode unification of punctuation and certain Latin and Cyrillic and Greek characters to one code point, whereas historically in Japanese fonts such characters were distinct (and their encoding in SJIS distinct from that in ASCII), has caused a similar lossy problem when composing text in various Japanese fonts of different vintages. Some fonts have U+201C ” as a full-width SJIS-like glyph, others treat that codepoint as proportional, and the mojikumi spacing rules are different (the classes are different), yet cannot be expressed in Unicode alone.

-- 
GitHub Notification of comment by macnmm
Please view or discuss this issue at https://github.com/w3c/jlreq/issues/240#issuecomment-706638875 using your GitHub account


-- 
Sent via github-notify-ml as configured in https://github.com/w3c/github-notify-ml-config

Received on Sunday, 11 October 2020 02:16:18 UTC