Re: Chinese typography and U+FF5E ~ FULLWIDTH TILDE

2017-02-28 14:07 GMT-05:00 Eric Muller <eric.muller@efele.net>:
> CLREQ currently says that U+FF5E ~ FULLWIDTH TILDE is prohibited at line
> start, not prohibited at line end (Appendix A). Its Unicode lb property is
> ID, which allows this character to be a line start in most cases, and
> therefore does not satisfy JLREQ. There is no mention of U+301C 〜 WAVE DASH.
>
> JLREQ lists U+301C 〜 WAVE DASH in cl-03 hyphens, prohibits it at line start,
> and not at line end (just like CLREQ does for U+FF5E). Its Unicode lb
> property is NS, which satisfies JLREQ. There is no mention of U+FF5E (JLREQ
> ignores all fullwidth characters). U+007F TILDE is listed as a western
> character, proportional.
>
> I can think of three solutions:
> - use U+301C 〜 WAVE DASH in CLREQ
> - tailor lb for Chinese to make U+FF5E have lb = NS
> - just make U+FF5E hae lb = NS
>
> In a corpus of ~30K Chinese books, I find 681,803 occurrences of U+FF5E ~
> FULLWIDTH TILDE, but only 3,258 occurrences of U+301C 〜 WAVE DASH. It seems
> to me that Chinese users have voted on U+FF5E, and that the first solution
> is not viable.
>
> I don't see a downside to the third solution, so it is my current best
> proposal.
>
> Other solutions? suggestions?

I only have a comment, which is probably not very useful or relevant:
A lot of these subtle distinctions, really, were made up when Unicode
was codified, and some of these made-up distinctions were outright
wrong (and of course now taken to be "right" because everyone is using
Unicode).

In Chinese, which variant of the dash (or ideographic comma, or
ideographic period) gets used really depends on what operating system
and/or software someone happens to be using. We type something, look
at it, and if the shape looks right we assume it's the right glyph.
There's really no "standard" in terms of code points. I'm still
getting different commas depending on whether I'm using Windows,
MacOSX, or Linux.

(In a strictly Chinese environment, of course, U+007E would not have
been always taken as Western, proportional. In practice (as in from a
user's POV) it could have been treated as CJK, half-width. But that's
even less relevant than my comment above.)
-- 
Ambrose Li // http://o.gniw.ca / http://gniw.ca
If you saw this on CE-L: You do not need my permission to quote
me, only proper attribution. Always cite your sources, even if
you have to anonymize and/or cite it as "personal communication".

Received on Tuesday, 28 February 2017 22:48:09 UTC