Re: Solutions to unify middle dot usage in Traditional Chinese

Bobby,

Apologies for the delay in replying.

Does CNS 11643 (which is a superset of Big Five, but not broadly used) include a character whose name is 間隔號? It doesn't seem so. My guess is that the Taiwan MOE "間隔號" and CNS 11643 "音界號" (1-2126; Big Five 0xA145) refer to the same thing.

There is a lot of legacy and compatibility issues, along with a large number of existing Traditional Chinese fonts that would prevent the use of a code point other than U+2027 for the purpose of a "middle dot" in Traditional Chinese. (There is also a compatibility issue with other platforms, such as OS X.)

Keep in mind that these mappings (or usage conventions, which are closely tied to the code points that Traditional Chinese IMEs emit, and the code points that are supported by Traditional Chinese fonts) were established over 20 years ago when Unicode properties were still being developed. Another issue is that some standards, including those from Taiwan (Big Five and CNS 11643), are published in a form that is effectively tables of representative glyphs with code point assignments, and the properties of the character are not specified. This no doubt made it a challenge to figure out what the appropriate Unicode mappings were, and clearly accounts for the platform differences that we're fighting today.

Anyway, while the official Unicode code point for Big Five 0xA145 and CNS 11643 1-2126 appears to be U+2027, it seems that OS X (aka, Macintosh) uses U+2022 or U+00B7. From what I dug up, it seems that U+2022 is the UTC mapping, and U+00B7 is the Apple mapping. I would suggest that U+2027 be full-width in Traditional Chinese fonts, at a minimum. How U+00B7 and U+2022 are handled is less clear. U+00B7 corresponds to Big Five 0xA150 (CNS 11643 1-2131), which leaves U+2022 for possible OS compatibility with U+2027.

BTW, one thing that we implemented in Source Han Sans for some characters was the ability of the 'locl' GSUB feature to specify different glyphs for particular code points, and which glyph is used depends on the specified language. U+2026 is a good example. When the language is non-CJK, the glyph should be an ellipsis, which are three periods (on the Latin baseline), but when the language is CJK, the glyphs should a three-dot leader whose dots are centered along the Y-axis in the em-box. The "middle dot" case can be handled similarly.

Regards...

-- Ken

> On Dec 14, 2014, at 9:25 PM, Bobby Tung <bobbytung@wanderer.tw> wrote:
> 
> Addition information.
> 
> On CNS 11643 page [1] , there are two dots.
> 
> One is full stop [2] for U+FF0E, that's ok, meaning and code point matches.
> 
> Another is 音界號 = hyphenation point [3] for U+2027. 
> 
> But the middle dot I want to unify is called 間隔號 in Chinese [4] . It's usage differed from hyphenation point. 
> 
> I'd like to ask, from unicoder's perspective. Should we encourage author to use the code point semantically right? 
> 
> [1] http://www.cns11643.gov.tw/AIDB/query_symbol_results.do

> 
> [2] http://www.cns11643.gov.tw/AIDB/query_symbol_view.do?page=1&code=2125

> 
> [3]
> http://www.cns11643.gov.tw/AIDB/query_symbol_view.do?page=1&code=2126

> 
> [4] http://www.edu.tw/files/site_content/M0001/hau/h14.htm

> 
> WANDERER Bobby Tung
> Sent from my iPhone.
> 
> Koji Ishii <kojiishi@gmail.com> 於 2014年12月15日 下午12:23 寫道:
> 
>> On Mon, Dec 15, 2014 at 12:31 PM, Ken Lunde <lunde@adobe.com> wrote:
>>> Koji,
>>> 
>>> For this issue, and for similar characters, what Traditional Chinese IMEs emit, in terms of Unicode values, and how Traditional Chinese fonts encode the corresponding glyphs, are much more important factors than UAX #11 (East Asian Width) property values.
>>> 
>>> For Traditional Chinese, the target character is clearly Big Five 0xA145, and this seems to correspond to U+2022 or U+2027, depending on the OS.
>> 
>> Understood, actually that matches to what I guessed (and feared ;).
>> The challenge would be on the layout engine side to handle EAW=A
>> correctly. It's not only for this code point, so we might need a good
>> solution for EAW=A someday, but just wanted to head up that it's
>> likely to cause some layout problems on most platforms today.
>> 
>> /koji

Received on Wednesday, 17 December 2014 20:27:35 UTC