W3C home > Mailing lists > Public > www-international@w3.org > October to December 2016

[csswg-drafts] Issue: [css-text-3] Insufficient normative reference to UAX14 for the ID line breaking class marked as i18n

From: r12a via GitHub <sysbot+gh@w3.org>
Date: Fri, 14 Oct 2016 09:49:26 +0000
To: www-international@w3.org
Message-ID: <issues.labeled-180841835-None-sysbot+gh@w3.org>
r12a has just labeled an issue for https://github.com/w3c/csswg-drafts
 as "i18n":

== [css-text-3] Insufficient normative reference to UAX14 for the ID 
line breaking class ==
Css-text-3 refers normatively to UAX14 in a few places, including:
* “[..] BK, CR, LF, CM, NL, and SG line breaking classes in [UAX14] 
must be honored.”
* “[...] WJ, ZW, and GL line-breaking classes in [UAX14] must be 
honored”
* “The line breaking behavior of a replaced element or other atomic 
inline is equivalent to an ideographic character (Unicode linebreaking
 class ID [UAX14]) [...]”
* “[...] any typographic character units resolving to the NU 
(“numeric”), AL (“alphabetic”), or SA (“Southeast Asian”) line 
breaking classes [UAX14] are instead treated as ID (“ideographic 
characters”) for the purpose of line-breaking.”

However, I cannot find any normative reference that requires the 
line-breaking behavior for characters with the line breaking class ID 
in UAX14 (Ideographic characters) to be honored, either directly or as
 part of a broader claim.

The 3rd and 4th bullets above suggest that it is expected, since 
something else is expected to behave like characters with that line 
breaking class, which doesn't make much sense if no particular 
behavior is expected of that class. Also, the design of the 
`break-word: normal` implicitly depends on this behavior being 
honored.

The following paragraph [in section 
5](https://drafts.csswg.org/css-text-3/#line-breaking) also indicates 
that this behavior is expected, but this sentence reads like 
informative prose, or at least  seems too vague to be effectively 
testable.

> In several other writing systems, (including Chinese, Japanese, Yi, 
and sometimes also Korean) a soft wrap opportunity is based on 
syllable boundaries, not word boundaries. In these systems a line can 
break anywhere except between certain character combinations. 
Additionally the level of strictness in these restrictions can vary 
with the typesetting style. 

The spec does (normatively) state that
> CSS does not fully define where soft wrap opportunities occur

and (informatively) that
> Further information on line breaking conventions can be found in 
[JLREQ] and [JIS4051] for Japanese, [ZHMARK] for Chinese, and in 
[UAX14] for all scripts in Unicode.

and for sure, the full logic of where soft wrap opportunities should 
go is complex and impractical to specify, but without going into the 
full gory details of opening and closing punctuation and non-starter 
characters etc, it should be possible to ensure that at least the 
general case works out as expected.

I think we should add a bullet point to [section 5.1 "Line breaking 
details](https://drafts.csswg.org/css-text-3/#line-break-details). 
Maybe something like:
> * When the `white-space` property allows wrapping, there is a soft 
wrap opportunity between pairs of characters with the ID line breaking
 class (see [!UAX14]). Additionally, there is a soft wrap opportunity 
before (and respectively after) characters with the ID line breaking 
class, unless the preceding (respectively following) character has the
 WJ or GL line breaking class (see [!UAX14]), or otherwise forbids 
breaks as determined by the `line-break` property.

This still leaves some wiggle room since the `line-break` property 
itself doesn't define exhaustive rules, but I think this should give a
 decent baseline requirement.

EDIT: typos

See https://github.com/w3c/csswg-drafts/issues/567
Received on Friday, 14 October 2016 09:49:33 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 22:41:11 UTC