- From: Leif Halvard Silli <xn--mlform-iua@xn--mlform-iua.no>
- Date: Mon, 27 Aug 2012 20:48:38 +0200
- To: Koji Ishii <kojiishi@gluesoft.co.jp>
- Cc: 'Glenn Adams' <glenn@skynav.com>, W3C Style <www-style@w3.org>, "public-i18n-cjk@w3.org" <public-i18n-cjk@w3.org>
Koji Ishii, Mon, 27 Aug 2012 08:13:43 -0400:
> If you have suggested wording, I can run it by fantasai to put into
> the spec.
W.r.t. the list of strictness recommendations, then a clarity problem
occurs because e.g. "Japanese" sometimes refers to script but other
times refers to "content language". The data is there, but it would be
"nice" if it was *very* clear when the break behavior is linked to the
character and when it is linked to knowledge about the language. To
solve this problem, then rather than proposing better wording, I would
propose to use a table rather than a list. For example, with a table,
then you could "tag" whether the forbidden line breaks are related to
1. script/character alone (e.g. before Japanese small kana)
2. combination of "common character" and Japanese and/or Chinese
content language (applies to e.g. hyphen ‐ U+2010')
3. combination of 'CJK codepoint' and Japanese and/or Chinese
content language (applies e.g. to FULLWIDTH TILDE ~ U+301C)
Part of the current unclarity is linked to the use of the term "CJK".
This term does not seem to be described anywhere. My understanding is
that fullwidth characters falls under the CJK umbrella, and I suspect
that this is also the case for the spec text. At the same time, the
paragraph beneath the list of recommendations fails to specify that
even fullwidth characters needs to be declared to be in Japanese (or
Chinese) before the distinctions in the recommendations apply. (See
below.)
The paragraph beneath the list of recommendations, tries to summarize
the situation, but in my view uses a few unlucky formulations:
]] In the recommended list above, no distinction is made among the
levels of strictness in non-CJK text: only CJK codepoints are affected,
unless the text is marked as Chinese or Japanese, in which case some
additional common codepoints are affected. However a future level of
CSS may add behaviors affecting non-CJK text. [[
Problems:
* 'CJK' is undefined and especially 'CJK' vs 'common codepoins' is
not defined. I suspect the text to see some codepoints that I see
as common code points as CJK codepoints. (E.g. the hyphen.)
* W.r.t. 'only CJK codepoints are affected': is Korean affected?
(That question may reveal my CJK un-familiarity - sorry …)
* The sentence which includes the phrase 'unless … marked as Chinese or
Japanese, in which some additional common code points are affected',
disguises the fact that some 'CJK' characters, such as FULLWIDTH
TILDE, has to be known to be of Japanese/Chinese content language
before the recommendations apply.
* the phrase "marked as" should perhaps be replaced by "is known to
be of content language", to be congruent with the rest?
But may be if you offer a clear table, as suggested above, then you can
make the explanative paragraph much shorter?!
> [1] http://www.w3.org/TR/css3-selectors/#lang-pseudo
> [2] http://dev.w3.org/csswg/css3-text/#content-language
>
> Regards,
> Koji
--
Leif Halvard Silli
Received on Monday, 27 August 2012 18:49:16 UTC