Re: Chinese typography and U+FF5E ~ FULLWIDTH TILDE

I just faced this issue reported by Japanese reviewer.

U+FF5E is mapped from BIG-5, so most of Traditional Chinese input method mapped the key below esc "~" to U+FF5E. But Japanese Input method mapped to U+301C WAVE DASH. And lots of Traditional Chinese fonts do not contain U+301C glyph.

We do not have any document to point right code point to Chinese punctuations. Recently I'm talking to Ministry of Education Taiwan to republish Handbook of Punctuations to add extra info for recommend code points. 


> Ambrose LI <ambrose.li@gmail.com> 於 2017年3月1日 上午6:39 寫道:
> 
> 2017-02-28 14:07 GMT-05:00 Eric Muller <eric.muller@efele.net <mailto:eric.muller@efele.net>>:
>> CLREQ currently says that U+FF5E ~ FULLWIDTH TILDE is prohibited at line
>> start, not prohibited at line end (Appendix A). Its Unicode lb property is
>> ID, which allows this character to be a line start in most cases, and
>> therefore does not satisfy JLREQ. There is no mention of U+301C 〜 WAVE DASH.
>> 
>> JLREQ lists U+301C 〜 WAVE DASH in cl-03 hyphens, prohibits it at line start,
>> and not at line end (just like CLREQ does for U+FF5E). Its Unicode lb
>> property is NS, which satisfies JLREQ. There is no mention of U+FF5E (JLREQ
>> ignores all fullwidth characters). U+007F TILDE is listed as a western
>> character, proportional.
>> 
>> I can think of three solutions:
>> - use U+301C 〜 WAVE DASH in CLREQ
>> - tailor lb for Chinese to make U+FF5E have lb = NS
>> - just make U+FF5E hae lb = NS
>> 
>> In a corpus of ~30K Chinese books, I find 681,803 occurrences of U+FF5E ~
>> FULLWIDTH TILDE, but only 3,258 occurrences of U+301C 〜 WAVE DASH. It seems
>> to me that Chinese users have voted on U+FF5E, and that the first solution
>> is not viable.
>> 
>> I don't see a downside to the third solution, so it is my current best
>> proposal.
>> 
>> Other solutions? suggestions?
> 
> I only have a comment, which is probably not very useful or relevant:
> A lot of these subtle distinctions, really, were made up when Unicode
> was codified, and some of these made-up distinctions were outright
> wrong (and of course now taken to be "right" because everyone is using
> Unicode).
> 
> In Chinese, which variant of the dash (or ideographic comma, or
> ideographic period) gets used really depends on what operating system
> and/or software someone happens to be using. We type something, look
> at it, and if the shape looks right we assume it's the right glyph.
> There's really no "standard" in terms of code points. I'm still
> getting different commas depending on whether I'm using Windows,
> MacOSX, or Linux.
> 
> (In a strictly Chinese environment, of course, U+007E would not have
> been always taken as Western, proportional. In practice (as in from a
> user's POV) it could have been treated as CJK, half-width. But that's
> even less relevant than my comment above.)
> -- 
> Ambrose Li // http://o.gniw.ca <http://o.gniw.ca/> / http://gniw.ca <http://gniw.ca/>
> If you saw this on CE-L: You do not need my permission to quote
> me, only proper attribution. Always cite your sources, even if
> you have to anonymize and/or cite it as "personal communication".

Received on Wednesday, 1 March 2017 03:50:57 UTC