Re: [css3-text] line-break questions/comments from Glenn Adams on 2012-08-27 (www-style@w3.org from August 2012)

From: Glenn Adams <glenn@skynav.com>
Date: Mon, 27 Aug 2012 12:01:45 +0800
To: Koji Ishii <kojiishi@gluesoft.co.jp>
Cc: W3C Style <www-style@w3.org>, public-i18n-cjk@w3.org
Message-ID: <CACQ=j+cbRh0dV-pdKb6xkheZb+cVjeh6joVVrPM+oXuxrpkNuw@mail.gmail.com>
On Sun, Aug 26, 2012 at 12:36 PM, Koji Ishii <kojiishi@gluesoft.co.jp>wrote:

> Hi Glenn, thank you for looking into this and wonderful feedbacks.
>
> > (1) "known to be Chinese or Japanese" is not defined in a manner
> > sufficient to obtain testability or interoperability at any level; some
> > default algorithm should be defined, e.g., "use the 'lang' attribute ..."
> > or "use the default language of the font if any" or "if there are any
> > hiragana or katakana character, then treat as Japanese; if any
> > hangul character, treat as Korean, otherwise ...", etc
>
> This refers to content language[1], and when such is not in the document,
> the spec says "it is possible for the content language of an element to be
> unknown", so this portion does not apply. This part of the spec is
> informative (as it is recommended) so UA may rely on other methods to
> determine if unknown such as automatic language detection.
>
> I guess we should change the "language" to "content language" with link to
> the terminology.
>

Yes, please change "language" to a link to "content language". It would
also be useful to add a NOTE under the first occurrence of "known to be
Chinese or Japanese" to the following effect:

"For the purpose of resolving 'known to be Chinese or Japanese', it is
sufficient to determine that the governing @lang attribute (or equivalent)
specifies a language tag containing 'ja' or 'zh' (or equivalent) as its
primary language subtag."


>
> > (2) line-break support is optional but recommended for CJK markets;
> > however, it is unclear whether its rules are intended to be applied in
> > the absence of "known to be Chinese or Japanese"; e.g., if in a UA
> > that supports line-break, the default algorithm for "known to be
> > Chinese or Japanese" returns false (e.g., if the entire text is
> > "A&#x2025;&#x2025;B"), then does the rule forbidding a break
> > between &#x2025; characters still apply when line-break:strict?
>
> Yes. Code points that may introduce unexpected behavior are under "if the
> language is known to be ..." and outside of that are either good or do no
> harm to apply regardless of scripts.
>


> > (3) speaking of "breaks between some inseparable characters: ‥ U+2025,
> > … U+2026" what exactly does "between" mean here? does it mean
> > between only the following four pairs or something else?
> >
> > &#x2025;&#x2025;
> > &#x2025;&#x2026;
> > &#x2026;&#x2025;
> > &#x2026;&#x2026;
>
> Correct. This refers to IN (Inseparable Characters)[2] class in UAX#14.
>

Please add some text making reference to this this definition, e.g., change
"between some inseparable characters" to read "between characters of the IN
(Inseparable Characters) class of [UAX14]".


>
> > (4) is it permissible for 'auto' behavior to differ from all of
> > normal|strict|loose? e.g., map to 'foo' (where foo is defined internally
> by UA)?
>
> I didn't think about this, but as far as spec says, I think yes. From
> author perspective, I think yes too; authors should use the property if
> they want specific behavior, possibly along with lang attribute.
>

Since many UAs make use of ICU, which uses UAX #14 for its default LB
rules, I would suggest adding an additional keyword value to this property
"uax14", and further specify that, "in the absence of any other relevant
criteria, a UA should treat 'auto' as if 'uax14' were specified". This will
improve interoperability and testability for the 'auto' value, which is the
default 'initial' value for this property.

It might also be useful to either specify (in the property definition) or
write in a note something like: "in the absence of any other relevant
criteria, a UA should interpret 'loose', 'normal', and 'strict' in
accordance with the default rules of [UAX14] modified as required to
satisfy the additional constraints specified in this section".


>
> > (5) regarding "breaks before postfixes", what if there is nothing prior
> > to the postfix or nothing prior within the same element? e.g., if we have
> >
> > <span style="line-break:strict">
> >  <span>X</span><span>%</span>
> > </span>
> >
> > then is a break permitted before the "[don't] break before postfix" '%'?
>
> The line break rules should apply cross-elements boundary, so the rule
> should apply in this case too. I know some implementations are broken in
> this regard though. As far as I discussed this with fantasai last time,
> 5.1. Line Breaking Details[3] says "a replaced element or other atomic
> inline is equivalent to that of the Object Replacement Character (U+FFFC)"
> so if one of the adjacent elements are inline-block, this will not apply.
>

It would be useful to add a NOTE that distills this information.


>
> > (6) same question as (5) for "breaks after prefixes", substituting after
> for before?
> >
> > <span style="line-break:strict">
> >  <span>$</span><span>X</span>
> > </span>
> > then is a break permitted after the "[don't] break after prefix" '$'?
>
> Same as (5). There are use cases like this:
>   <p><ruby>base<rt>r</rt><ruby>.</p>
> We don't want to break before the period.
>
> > (7) what is behavior when different line-break modes apply to adjacent
> text? e.g.
> >
> > <span style="line-break:loose">$</span><span
> style="line-break:strict">%</span>
> >
> > <span style="line-break:strict">$</span><span
> style="line-break:loose">%</span>
>
> That is a really good question. I thought I discussed this with fantasai
> and defined but it looks like it was a dream...
>
> Take stricter one works good for me. If it works good for everyone, I'll
> add this to the spec.
>

As you suggest elsewhere, it may be better to restrict the applicability
from this property from "all elements" to "block containers".


>
> [1] http://dev.w3.org/csswg/css3-text/#content-language
> [2] http://unicode.org/reports/tr14/#IN
> [3] http://dev.w3.org/csswg/css3-text/#line-break-details
>
> Regards,
> Koji
>
Received on Monday, 27 August 2012 04:02:35 UTC