[css-text-3] word-break: break-all

Several months ago, Blink changed the implementation of "word-break:
break-all"[1] to as the spec defines:

  may break between any two typographic letter units

This value is, as written in the spec, designed to be easy to
implement without sacrificing CJK line break rules, since we believed
its primary use is in CJK.

However, since our change, I hear that it does not work as expected
from Latin and other non-CJK authors such as Arabic, and Blink is the
only browser that is broken. Examples I've got are to expect to break
anywhere in "AT&T" or "*****", and Trident/Gecko/WebKit all break
these strings.

So I'd like to propose to change the spec so that it can serve both
CJK and non-CJK usages, and is more interoperable with existing
implementations.

I checked the behavior for ASCII code points here[2], but in short:

Trident/Edge: Breaks almost anywhere except before closing
parenthesis, period, etc. "&" and "*" in the examples above can break
before and after.
Gecko/WebKit: Breaks anywhere.

Since what Gecko/WebKit does is quite unfortunate for CJK, I'm
thinking to be similar to what Trident/Edge does.

As far as I can see from ASCII code range, the rules are:

* Not break before !"'),./:;?]}
* Not break after "$'(-[\{

So by translating them to UAX#14 Line Breaking classes, rules would be:

* Not break before EX, QU, CP, IS, SY
* Not break after QU, PR, OP, HY, PR

I think I'll need to check side-effects and Trident/Edge behavior a
little more in details, but would appreciate opinions/feedback if any.

[1] https://drafts.csswg.org/css-text-3/#valdef-word-break-break-all
[2] http://kojiishi.github.io/playgrounds/line-break-matrix/?word-break=break-all

/koji

Received on Wednesday, 14 October 2015 16:34:17 UTC