CSS3 Text: Line-breaking Properties

  # In the most general case, (assuming no hyphenation dictionary is
  # available to the UA), a line break can occur only at white space
  # characters or hyphens, including U+00AD SOFT HYPHEN.

This doesn't seem to match UAX 14.

  # line-break: normal | string
  #
  # normal
  #   Selects the normal line breaking mode for CJK.
  # strict
  #   Selects a more restrictive line breaking mode for CJK text.
...
  # word-break-cjk: normal | break-all | keep-all
  #
  # normal
  #   Keeps non-CJK scripts together (according to their own rules),
  #   while Hangul and CJK ideographs... break according to the rules
  #   set by 'line-break' property.
  # break-all
  #   Same as 'normal' for CJK ideographs and Hangul, but non-CJK
  #   scripts can break anywhere.
  # keep-all
  #   Same as 'normal' for all non-CJK scripts. CJK ideographs and
  #   Hangul are kept together.

This organization of properties seems a bit.. non-optimal.
  - 'line-break' is CJK-specific
  - 'word-break-cjk' affects breaking in non-CJK
  - 'line-break's functionality is tangled with 'word-break-cjk's

Suppose we did this:

     line-break-cjk: normal | strict | word
line-break-general: normal | strict | anywhere

IMO this is much neater. We can control CJK-type scripts and other scripts
independently. Because of this, the purpose of the properties is also much
clearer.
For CJK-type scripts, line breaking is as follows:

line-break-cjk
   normal - as for current "line-break: normal"
   strict - as for current "line-break: strict"
   word   - as for current "word-break-cjk: keep-all"

For non-CJK, 'anywhere' selects the "break-anywhere" behavior of
"word-break-cjk: break-all" without affecting CJK scripts. 'normal' and
'strict' for non-cjk allows two different levels for line-breaking.
For example,

line-break-general
   normal   - as defined in UAX 14 for non-ideographic
   strict   - only break on spaces and other explicit opportunities like zwsp
   anywhere - as for "word-break-cjk: break-all"

With this definition, 'strict' can be used to prohibit breaking after
hyphen-minus. It's also nice for formatting code with word wrap and
probably other things as well.

('strict' could also be left out, giving
    line-break-general
      normal   - as defined in UAX 14 for non-ideographic
      anywhere - as for "word-break-cjk: break-all")


# break-all
#    Same as 'normal' for CJK ideographs and Hangul, but non-CJK scripts can
#    break anywhere. This option is used mostly in a context where the text is
#    predominantly using CJK characters with few non-CJK excerpts and it is
#    desired that the text be better distributed on each line. The UAs may
#    however limit the break everywhere behavior for script using clusters such
#    as Thai.

The effect of "word-break-cjk: break-all" on the punctuation rules needs
to be explained. E.g. can there be a break between consecutive hyphens?

~fantasai

Received on Monday, 21 April 2003 12:43:17 UTC