Re: Solutions to unify middle dot usage in Traditional Chinese

小林さん

> First, the name of the mailing list suggest that a Chinese version of JLREQ and (the still-in-development) KLREQ is in the works. If so, that's great news.

Yes, Chinese text layout task force is going to publish several "REQs" of Asian language, Mongolian, Tibetan, and of course Traditional/Simplified Chinese/Hanzi. [1]

I'm working on the first working draft of TC/SC Hanzi. We also want to list punctuations' code points as reference to solve the mess. Middle Dot is one of them.

> About the middle dot in Traditional Chinese, based on the exchange between Addison and me yesterday, both U+30FB and U+FF0E must be removed from the equation, because the former has strong ties to Japanese-only usage (and because Chinese fonts may not include a glyph for this character) and the latter is a full-stop (aka period) that happens to be centered within the em-box for Traditional Chinese use.


Agreed.

> That leaves U+00B7 and U+2027, but U+2022 should also be considered.

I think U+2022 is a bullet and usually larger than the middle dot. And it may used for emphasis dot, filled one.

If we get the glyph smaller, authors may need extra work to fix.

> The U+2022 versus U+2027 mapping difference is likely yet another platform (Windows versus OS X) difference in the treatment of some punctuation and other symbols. I suggest that someone use the standard Traditional Chinese IME on both platforms to input the character for 0xA145, then inspect which Unicode character was emitted into the document.。

I will try to sort them out. Yosemite's default IME lists : 4 U+30FB 5 U+00B7 6 U+FF0E [2]. Quite terrible, 

> Source Han Sans (and Noto Sans CJK), the Pan-CJK typeface family that we released earlier this year, do not conform to the above recommendation, but I can implement these recommendations for Traditional Chinese fonts and font instances in the Version 1.002 update that I am planning for the early part of 2015. This would affect only U+00B7 (it is currently proportional) and U+2022 (ditto). U+2027 is a full-width middle dot.

Thank you. I may list them on the ZHREQ for middle dot, I'm thinking about:

[U+00B7] en, zh-Hant, it will be half-width. 
[U+2027] en, zh-Hant, probably fallback to Chinese font to be full-width.

[U+00B7] zh-Hant <-> zh-Hans, same.
[U+2027] zh-Hant <-> zh-Hans, need mapping.

There's still a lot of issue need to be solved, such as ellipsis in TC. Hope the ZHREQ will help. Later better than never.

Regards.


[1] http://www.w3.org/International/groups/chinese-layout/charter
[2] https://www.dropbox.com/s/1muxxy6rucb5zyz/yosemiteimedot.png

Bobby


> Ken Lunde <lunde@adobe.com> 於 2014年12月11日 下午11:09 寫道:
> 
> Bobby,
> 
> Allow me to insert a few comments about this particular issue.
> 
> 
> About the middle dot in Traditional Chinese, based on the exchange between Addison and me yesterday, both U+30FB and U+FF0E must be removed from the equation, because the former has strong ties to Japanese-only usage (and because Chinese fonts may not include a glyph for this character) and the latter is a full-stop (aka period) that happens to be centered within the em-box for Traditional Chinese use.
> 
> 
> If you examine Big Five and the near identical CNS 11643 Planes 1 and 2, 0xA150 is grouped together with the so-called "small punctuation" whose practical use still escapes me and most others. These "small" characters are in Unicode starting from U+FE50:
> 
>  http://www.unicode.org/charts/PDF/UFE50.pdf
> 
> In fact, the gap at U+FE53 would correspond to 0xA150 because the characters are in Big Five order (and are in Unicode only because they were in Big Five), so when Unicode Version 1.1 was compiled, U+FE53 was likely occupied in a draft version, then removed, possibly to unify with U+00B7.
> 
> Because the usage of these "small" characters is unclear, I would put less emphasis on it and the use of U+00B7. Instead, the more common middle dot would be 0xA145, which corresponds to U+2027 (according to your notes below), but I think that U+2022 is the better mapping.
> 
> The U+2022 versus U+2027 mapping difference is likely yet another platform (Windows versus OS X) difference in the treatment of some punctuation and other symbols. I suggest that someone use the standard Traditional Chinese IME on both platforms to input the character for 0xA145, then inspect which Unicode character was emitted into the document.
> 
> My recommendation would thus be for Traditional Chinese fonts to include full-width versions of U+00B7 (mainly for compatibility reasons), U+2022, and U+2027. The latter two are to compensate for platform mapping differences, and I would consider them to be much more important than U+00B7 in a Traditional Chinese context.
> 
> Source Han Sans (and Noto Sans CJK), the Pan-CJK typeface family that we released earlier this year, do not conform to the above recommendation, but I can implement these recommendations for Traditional Chinese fonts and font instances in the Version 1.002 update that I am planning for the early part of 2015. This would affect only U+00B7 (it is currently proportional) and U+2022 (ditto). U+2027 is a full-width middle dot.
> 
> Regards...
> 
> -- Ken
> 
>> On Dec 10, 2014, at 6:23 AM, Bobby Tung <bobbytung@wanderer.tw> wrote:
>> 
>> Hello,
>> 
>> There's a problem I found about the middle dot usage in Traditional Chinese.
>> 
>> --Usage
>> 
>> Middle dot for Traditional Chinese has 3 usages list below: 
>> 
>> 1, separates translated latin name in Hanzi, e.g. 理查・石田
>> 
>> 2, as decimal point in Hanzi e.g. 三・一四
>> 
>> 3, separates book, chapter, title e.g.  詩經・魏風・碩鼠
>> 
>> In Traditional Chinese, the Middle dot should be full-width and a filled round dot in the middle.
>> 
>> --Codepoint
>> 
>> There's some codepoints general used for the middle dot in Traditional Chinese.
>> 
>> · U+00B7  MIDDLE DOT
>> ‧ U+2027  HYPHENATION POINT
>> ・ U+30FB  KATAKANA MIDDLE DOT
>> . U+FF0E  FULLWIDTH FULL STOP
>> 
>> And in Simplified Chinese usage, the middle dot is U+00B7.
>> 
>> U+00B7 from A150 and U+2027 from A145 on BIG 5 code table[1]. 
>> 
>> But I think U+00B7's definition more suitable for the middle dot than U+2027 / U+FF0E. 
>> 
>> --Solutions
>> 
>> Considering about interoperability and codepoint definition, I have 2 proposals.
>> 
>> 1. use U+00B7 as general middle dot, if authors want to let it full-width, use U+30FB. But most Chinese fonts do not have the glyph, certainly fallback to Japanese font. [2]
>> 
>> 2. use U+00B7 as general middle dot, and in Traditional Chinese subset, let glyph be full-width. 
>> 
>> 
>> =====
>> 
>> 
>> 各位,我發現繁體字的中點在使用上相當混亂,想藉寫中文排版需求時把標準訂下來,提出兩個方案。
>> 
>> 先提出繁體字「連接號」(舊稱音節號)使用的狀況:
>> 
>> 1, 用來分隔漢譯姓與名,例如:理查・石田
>> 
>> 2, 作為漢字數字的小數點,例如:三・一四
>> 
>> 3, 用來分隔書、章、作品名,例如:詩經・魏風・碩鼠
>> 
>> 而在繁體字的用法上,連接號應該為全形/全角,為置中的實心點。
>> 
>> 再來從實際的文件上,會發現有最常使用的四個Codepoints:
>> 
>> · U+00B7  MIDDLE DOT
>> ‧ U+2027  HYPHENATION POINT
>> ・ U+30FB  KATAKANA MIDDLE DOT
>> . U+FF0E  FULLWIDTH FULL STOP
>> 
>> 簡體字則是統一使用U+00B7,而U+00B7來自BIG 5的A150,但我認為U+00B7的定義比較符合使用狀況,所以不考慮使用U+2027與U+FF0E。
>> 
>> 所以提出的方案如下:
>> 
>> 1, 使用U+00B7作為標準中點,若作者想要全形,則使用U+30FB,但因為這個Codepoint許多中文字型沒有造,所以幾乎一定會Fallback到日文字型。
>> 
>> 2, 使用U+00B7作為標準中點,但在繁體字字型中,將其造為全形。
>> 
>> 
>> [1]: http://www.khngai.com/chinese/charmap/tblbig.php?page=0
>> [2]: http://www.unicode.org/reports/tr11/
>> 
>> 
>> 
>> WANDERER Digital Publishing Inc.
>> Bobby Tung @bobtung
>> Mobile:+886-975068558
>> bobbytung@wanderer.tw
>> http://wanderer.tw
>> 
> 

Received on Thursday, 11 December 2014 16:41:12 UTC