RE: [css3-text] line-break questions/comments from Leif Halvard Silli on 2012-08-27 (www-style@w3.org from August 2012)

From: Leif Halvard Silli <xn--mlform-iua@xn--mlform-iua.no>
Date: Mon, 27 Aug 2012 20:48:38 +0200
To: Koji Ishii <kojiishi@gluesoft.co.jp>
Cc: 'Glenn Adams' <glenn@skynav.com>, W3C Style <www-style@w3.org>, "public-i18n-cjk@w3.org" <public-i18n-cjk@w3.org>
Message-ID: <20120827204838537262.7b560f0d@xn--mlform-iua.no>

Koji Ishii, Mon, 27 Aug 2012 08:13:43 -0400:

> If you have suggested wording, I can run it by fantasai to put into 
> the spec.

W.r.t. the list of strictness recommendations, then a clarity problem 
occurs because e.g. "Japanese" sometimes refers to script but other 
times refers to "content language". The data is there, but it would be 
"nice" if it was *very* clear when the break behavior is linked to the 
character and when it is linked to knowledge about the language. To 
solve this problem, then rather than proposing better wording, I would 
propose to use a table rather than a list. For example, with a table, 
then you could "tag" whether the forbidden line breaks are related to 

 1. script/character alone (e.g. before Japanese small kana)
 2. combination of "common character" and Japanese and/or Chinese
    content language (applies to e.g. hyphen ‐ U+2010')
 3. combination of 'CJK codepoint' and Japanese and/or Chinese
    content language (applies e.g. to FULLWIDTH TILDE ～ U+301C)

Part of the current unclarity is linked to the use of the term "CJK". 
This term does not seem to be described anywhere. My understanding is 
that fullwidth characters falls under the CJK umbrella, and I suspect 
that this is also the case for the spec text. At the same time, the 
paragraph beneath the list of recommendations fails to specify that 
even fullwidth characters needs to be declared to be in Japanese (or 
Chinese) before the distinctions in the recommendations apply. (See 
below.)

The paragraph beneath the list of recommendations, tries to summarize 
the situation, but in my view uses a few unlucky formulations:

]] In the recommended list above, no distinction is made among the 
levels of strictness in non-CJK text: only CJK codepoints are affected, 
unless the text is marked as Chinese or Japanese, in which case some
additional common codepoints are affected. However a future level of 
CSS may add behaviors affecting non-CJK text. [[

Problems:
 * 'CJK' is undefined and especially 'CJK' vs 'common codepoins' is
   not defined. I suspect the text to see some codepoints that I see
   as common code points as CJK codepoints. (E.g. the hyphen.)
 * W.r.t. 'only CJK codepoints are affected': is Korean affected?
   (That question may reveal my CJK un-familiarity - sorry …)
 * The sentence which includes the phrase 'unless … marked as Chinese or
   Japanese, in which some additional common code points are affected',
   disguises the fact that some 'CJK' characters, such as  FULLWIDTH 
   TILDE, has to be known to be of Japanese/Chinese content language
   before the recommendations apply.
 * the phrase "marked as" should perhaps be replaced by "is known to
   be of content language", to be congruent with the rest?

But may be if you offer a clear table, as suggested above, then you can 
make the explanative paragraph much shorter?!

> [1] http://www.w3.org/TR/css3-selectors/#lang-pseudo

> [2] http://dev.w3.org/csswg/css3-text/#content-language

> 
> Regards,
> Koji
-- 
Leif Halvard Silli

Received on Monday, 27 August 2012 18:49:16 UTC