Re: Solutions to unify middle dot usage in Traditional Chinese

I don't know much about which code points exiting Chinese fonts
support, but from layout engine perspective, it'd be helpful to use
EAW=W/F code points for Chinese documents as much as possible. U+30FB
is W, while U+2022/2027 are A, so you may experience layout engines
handle U+2022/2027 as a Latin character in terms of the behavior
around line breaking, justification, etc.

/koji

On Fri, Dec 12, 2014 at 10:27 PM, Ken Lunde <lunde@adobe.com> wrote:
> Martin,
>
> My comment about U+30FB needs to be viewed from the perspective of Traditional Chinese standards and fonts, almost all of which do not include a glyph for U+30FB, thus recommending its use would most certainly result in font fallback kicking in, which is generally not a good idea if document fidelity is a concern (which it should be). A glyph for U+30FB can only be guaranteed to be present in a Japanese font. Once you go beyond Japanese, all bets are off. And yes, I am fully aware that the usage in Japanese goes beyond katakana, though it is grouped with the katakana characters in Unicode.
>
> Also, Traditional Chinese fonts generally have a Big Five (or CNS 11643) heritage, and what we're really discussing is how Big Five 0xA145 is being handled, in terms of what Unicode code point is emitted when this character is entered through typical Traditional Chinese IMEs. This seems to be yet another case of a platform difference for the mapping, and the two code points seem to be U+2022 and U+2027. Whether these are considered "correct" or not is somewhat irrelevant, because my experience working with Apple and Microsoft on such issues is that, for better or worse, they won't change these mappings, mainly due to legacy concerns.
>
> Regards...
>
> -- Ken
>
>> On Dec 11, 2014, at 11:00 PM, Martin J. Dürst <duerst@it.aoyama.ac.jp> wrote:
>>
>> On 2014/12/12 00:09, Ken Lunde wrote:
>>> Bobby,
>>>
>>> Allow me to insert a few comments about this particular issue.
>>>
>>> First, the name of the mailing list suggest that a Chinese version of JLREQ and (the still-in-development) KLREQ is in the works. If so, that's great news.
>>>
>>> About the middle dot in Traditional Chinese, based on the exchange between Addison and me yesterday, both U+30FB and U+FF0E must be removed from the equation, because the former has strong ties to Japanese-only usage (and because Chinese fonts may not include a glyph for this character) and the latter is a full-stop (aka period) that happens to be centered within the em-box for Traditional Chinese use.
>>>
>>> That leaves U+00B7 and U+2027, but U+2022 should also be considered.
>>
>> Hello Ken,
>>
>> You say "U+30FB ... must be removed from the equation, because [it] has strong ties to Japanese-only usage (and because Chinese fonts may not include a glyph for this character)".
>>
>> The later might make a reasonably good reason for the time being (but could be fixed). But I don't understand the former. Characters in Unicode are not tied to languages. U+30FB is named "KATAKANA MIDDLE DOT", but that's a misnomer, it's used in many places where no katakana are around. See https://ja.wikipedia.org/wiki/中黒.
>>
>> That also shows that the use cases, although not exactly the same, are pretty close to those mentioned by Bobby in his first mail (see also https://zh.wikipedia.org/wiki/间隔).
>>
>> Appearance also seems to be extremely close if not the same, definitely within the kinds of variance that has to be dealt with anyway to address Han unification. And the characters seems to be intrinsically full-width, which would favor U+30FB over U+00B7.
>>
>> On the other hand, U+2027 (HYPHENATION POINT) and U+2022 (BULLET) seem to be semantically totally different from what we are looking at.
>>
>> In summary, I'm not necessarily disagreeing with you, but it would be highly preferable to have something better than the inherently circular "this character is only used in Japanese so it cannot be used in Chinese".
>>
>> Regards,    Martin.
>

Received on Monday, 15 December 2014 01:38:58 UTC