RE: [css3-text] line-break questions/comments from Koji Ishii on 2012-08-27 (public-i18n-cjk@w3.org from July to September 2012)

From: Koji Ishii <kojiishi@gluesoft.co.jp>
Date: Mon, 27 Aug 2012 01:49:12 -0400
To: Glenn Adams <glenn@skynav.com>
CC: W3C Style <www-style@w3.org>, "public-i18n-cjk@w3.org" <public-i18n-cjk@w3.org>
Message-ID: <A592E245B36A8949BDB0A302B375FB4E0D5E63C250@MAILR001.mail.lan>
>>> (1) "known to be Chinese or Japanese" is not defined in a manner
>>> sufficient to obtain testability or interoperability at any level; some
>>> default algorithm should be defined, e.g., "use the 'lang' attribute ..."
>>> or "use the default language of the font if any" or "if there are any
>>> hiragana or katakana character, then treat as Japanese; if any
>>> hangul character, treat as Korean, otherwise ...", etc
>>
>> This refers to content language[1], and when such is not in the document,
>> the spec says "it is possible for the content language of an element to be
>> unknown", so this portion does not apply. This part of the spec is informative
>> (as it is recommended) so UA may rely on other methods to determine if
>> unknown such as automatic language detection.
>>
>> I guess we should change the "language" to "content language" with link to
>> the terminology.
>
> Yes, please change "language" to a link to "content language". It would also
> be useful to add a NOTE under the first occurrence of "known to be Chinese
> or Japanese" to the following effect:
>
> "For the purpose of resolving 'known to be Chinese or Japanese', it is
>  sufficient to determine that the governing @lang attribute (or equivalent)
>  specifies a language tag containing 'ja' or 'zh' (or equivalent) as its primary
>  language subtag."

Fixed the link part.

I don't think adding a note to this section is a good idea. First, it's more complex than one might imagine, like what to do when both @lang and @xml:lang were specified with different values. We should be consistent with what, for instance, lang selector does, and with what i18n WG says. Second, this property is not the only property that use content language; see underline-position for instance. It doesn't look smart if the same notes appear everywhere we refer to the content language, does it?

>>> (3) speaking of "breaks between some inseparable characters: ‥ U+2025,
>>> … U+2026" what exactly does "between" mean here? does it mean
>>> between only the following four pairs or something else?
>>>
>>> &#x2025;&#x2025;
>>> &#x2025;&#x2026;
>>> &#x2026;&#x2025;
>>> &#x2026;&#x2026;
>>
>>Correct. This refers to IN (Inseparable Characters)[2] class in UAX#14.
>
> Please add some text making reference to this this definition, e.g., change
> "between some inseparable characters" to read "between characters of
> the IN (Inseparable Characters) class of [UAX14]".

Ok, will do.

>>> (4) is it permissible for 'auto' behavior to differ from all of
>>> normal|strict|loose? e.g., map to 'foo' (where foo is defined internally by UA)?
>>
>> I didn't think about this, but as far as spec says, I think yes. From author
>> perspective, I think yes too; authors should use the property if they want
>> specific behavior, possibly along with lang attribute.
>
> Since many UAs make use of ICU, which uses UAX #14 for its default LB rules,
> I would suggest adding an additional keyword value to this property "uax14", and
> further specify that, "in the absence of any other relevant criteria, a UA should
> treat 'auto' as if 'uax14' were specified". This will improve interoperability and
> testability for the 'auto' value, which is the default 'initial' value for this property.

IE, for instance, uses 'normal' as 'auto', and doesn't use ICU. It's mostly the same as the normal ja settings of ICU50, but not exactly the same. I don't see much user value for IE to implement exactly the same line breaking as UAX#14 and changing 'auto' to it.

UAX#14 has several character classes that change their behavior by application's input. AI or CJ classes for example, so there's no single UAX#14 line breaking. Also UAX#14 changes over time, ICU changes how to interpret classes like AI or CJ by versions, and UAs take different versions of ICU, so just adding 'uax14' doesn't help much.

We can test differences between loose, normal, and strict. We can't test exact set of code points as you said, but we could test common characters without requiring conformance.


> It might also be useful to either specify (in the property definition) or write in a
> note something like: "in the absence of any other relevant criteria, a UA should
> interpret 'loose', 'normal', and 'strict' in accordance with the default rules of
> [UAX14] modified as required to satisfy the additional constraints specified in
> this section".

Because we don't define the baseline, we can't say this. The baseline is UA dependent, and we define the minimum differences between values.


>> The line break rules should apply cross-elements boundary, so the rule should
>> apply in this case too. I know some implementations are broken in this regard
>> though. As far as I discussed this with fantasai last time, 5.1. Line Breaking
>> Details[3] says "a replaced element or other atomic inline is equivalent to that
>> of the Object Replacement Character (U+FFFC)" so if one of the adjacent
>> elements are inline-block, this will not apply.
>
> It would be useful to add a NOTE that distills this information.

Hm. The information is just one page above the text, maybe examples help better. I'll be working on it later.


Regards,
Koji
Received on Monday, 27 August 2012 05:49:49 UTC