[css3-text] definition of 'word-break: break-all' (was: [css3-text] feedback on 'word-break: keep-all;')

(12/05/04 16:37), Koji Ishii wrote:
> UAX#14, 8.2 Examples of Customization, Example 3[1] states the behavior for Korean. The referenced material doesn't seem to be available online, unfortunately, but it states:
> 
>> Only the intersections of ID/ID, AL/ID, and ID/AL are affected
> 
> It looks to me that prohibiting break opportunities for these combination gives what we want here.
> 
> [1] http://unicode.org/reports/tr14/#Tailoring

So, what about 'word-beak: break-all', Example 4 talks about this with
the conclusion being

[[
  In this case the intersections of NU/NU, NU/AL, AL/AL and AL/NU are
  affected.
]]

.

I think this is relatively close to the current definition:

  # In addition to normal opportunities, lines may break between any
  # two letters within words except where forbidden by the ‘line-
  # break’ property.

, if I read the "letter" in this sentence correctly where AL roughly
equals the Letter general category and NU roughly equals the Number
general category (but in that case, please link "letter" to the
definition in 1.3). There are some minor differences however:

The ASCII characters in AL but not in Letter are #&*<=>@^_`~ [1]. In
IE9, line breaks are allowed before and after these character with
'word-break: break-all', matching what the final sentence of Example 4
says but not the current prose.

But I am not sure changing the current definition to that sentence
doesn't have any drawback.


One other thing, it seems that in Chromium18 and Firefox15a1,
'word-break: break-all' really breaks all grapheme cluster boundaries,
which means that, for example, it breaks before the '。' in

  測試。

, where normally such a break is prohibited. I would like to double
check with the editors and folks on this list that this is indeed not
conforming according to the spec.

If the main use case of 'word-break: break-all' is for CJK text with
certain non-CJK text embedded, then it is indeed true that breaking
before '。' should not be allowed, but it's not clear to me if there are
use cases when authors just want to "break all" characters.

Similarly, with 'word-break: break-all;', IE9 (which means that this
behavior might be there since IE5.5) currently doesn't break before ','
and around nbsp, while Chromium18 and Firefox15a1 do.

[1] http://www.unicode.org/Public/6.1.0/ucd/LineBreak.txt


Cheers,
Kenny

Received on Thursday, 10 May 2012 03:00:35 UTC