- From: Christoph PŠper <christoph.paeper@crissov.de>
- Date: Fri, 10 Apr 2009 21:14:51 +0200
- To: CSS 3 W3C Group <www-style@w3.org>
While the |word-break| property is at risk for level 3 of the CSS
Text module
<http://www.w3.org/TR/css3-text/#line-breaking> /
<http://dev.w3.org/csswg/css3-text/#line-breaking>
and hyphenation, currently located in the all-purpose module CGPM
<http://www.w3.org/TR/css3-gcpm/#hyphenation> /
<http://dev.w3.org/csswg/css3-gcpm/#hyphenation>,
is quite open to discussion, I would like to raise some points.
== Kinds of word breaking ==
The current hyphenation proposal distinguishes in its |hyphens|
property three kinds of hyphenation: never ('none'), as indicated
('manual') and by algorithm or database ('auto').
For some languages, such as German, or scripts, probably CJK, there
is a state between 'manual' and 'auto' that would be useful: break
compounds. It is less complex to implement than a fully syllabic or
morphemic algorithm (at least for alphabetic scripts) and it
sometimes is the preferred style in titles or headings.
Zeilen-|Trennung manual break point (|)
Zeilen<ZW>|trennung manual break point
Zeilen<SH>átrennung manual hyphenation point (á)
Zeilentrennung lexemic, keep words together, no hyphenation
or breaks
Zeilenátrennung sememic, hyphenate compounds
Zeiláenátrennáung morphemic, hyphenate at grammatical boundaries
Zeiálenátrenánung syllabic, hyphenate at articulatory boundaries
Z.ei.l.e graphemic, keep diphthongs, ligatures etc.
together
Z.e.i.l.e graphetic, split between characters/letters
Z.e.ő.ˇ.l.e glyphic, decompose characters if possible/
necessary
Morphemic and syllabic should be mutually exclusive and usually a
language uses either one or the other, some do not make a clear
choice, though. Manual indications and compound boundaries are
usually still preferred break points -- a hierarchy can be
generalized through all levels. Breaking at graphemes or graphs
(glyphs) is a last resort for alphabetic scripts and usually is
rather done with non- or para-words which may prefer splitting over
hyphenating. Breaking or splitting can be seen as hyphenation without
visible hyphen, or the other way around.
These types (or a sensible selection thereof) could be mapped onto
separate properties (|*-break| perhaps) or onto respective values of
one property (|hyphenation|, |word-break| or whatever).
== CJK word breaking ==
Contrary to intuition |word-break| seems only ever useful if one is
writing text with east-asian square "morphograms", perhaps with
intertwined alphabetic words. Maybe it is also useful for stuff like
URLs inside alphabetic text, but that rather seems the domain of |
text-wrap| and |word-wrap|. Several other properties only make sense
for (European) alphabetic scripts, so that is not an issue by itself,
but perhaps it would be better to do this kind of script-dependent
styling with the |:lang()| or a new |:script()| pseudo-class selector
instead (using ISO 15924 four-letter or three-digit codes).
CJK
-----------------------------------
! strict ! loose !
|--------+----------------+----------------+
! strict ! 'normal' | 'loose' |
Other ! ! ('keep-all') | |
scripts |--------+----------------+----------------+
! loose ! 'break-strict' | 'break-all' |
|--------+----------------+----------------+
Table 1: Current draft for |word-break|
With |:script()| this could be simplified and be more flexible:
* {word-break: normal;}
=> * {word-break: normal;}
* {word-break: keep-all;}
=> * {word-break: normal;}
:script(Hani), :script(Jpan)
{word-break: strict;} /* if I understand the intention
correctly */
* {word-break: loose;}
=> * {word-break: normal;}
:script(Hani), :script(Jpan), :script(Kore)
{word-break: loose;}
* {word-break: break-strict;}
=> * {word-break: loose;}
:script(Hani), :script(Jpan), :script(Kore)
{word-break: normal;}
* {word-break: break-all;}
=> * {word-break: loose;}
If ISO 15924 introduced more general aliases based on script features
(e.g. 'Logo' or 'Sylb' and 'Alph') the selectors could become easier
and of course you could also write them the other way around using
|:not()|.
== General breaking ==
From an author's perspective it might be nice to have text breaking
and hyphenation work similar to page (and column) breaking. We shall
only be dealing with the "inside" variant here, so we may drop that
name particle and inherit its values: 'auto' (~= allow) and 'avoid'.
line-break: auto | avoid | none;
/* ~= text-wrap, for whitespace treatment */
text-wrap: normal;
~> line-break: auto;
text-wrap: suppress;
~> line-break: avoid;
text-wrap: none;
~> line-break: none;
text-wrap: unrestricted;
~> line-break: auto; character-break: auto; hyphenation: none;
What is a /word/ in the CSS (or Unicode) sense? Any string of
characters bordered by whitespaces or punctuation marks, any lexeme?
word-break: none | manual | compound | _auto_ | syllable | character;
or
compound-break: _auto_ | avoid; /* sememic */
word-break: _auto_ | avoid; /* syllabic/morphemic */
syllable-break: auto | _avoid_; /* graphe(m/t)ic */
character-break: auto | _avoid_; /* often not possible at all */
Hyphenation control has to be set separately, e.g. which string if
any (i.e. not split) to use.
word-wrap: normal;
~> *-break: auto;
word-wrap: break-word;
~> word-break: character; hyphenation: ""; /* 1 word break property */
~> character-break: auto; hyphenation: ""; /* n break properties */
* {word-break: normal;}
~> * {word-break: syllable;} /* 1, default */
~> * {word-break: auto; compound-break: auto;} /* n, defaults */
* {word-break: keep-all;}
~> * {word-break: none;} /* 1 */
~> * {compound-break: avoid;} /* n */
* {word-break: loose;}
~> :script(Hani), :script(Jpan), :script(Kore) /* 1 */
{word-break: character;}
~> :script(Hani), :script(Jpan), :script(Kore) /* n */
{syllable-break: auto;}
* {word-break: break-strict;}
~> * {word-break: character;} /* 1 */
:script(Hani), :script(Jpan), :script(Kore)
{word-break: syllable;} /* default */
~> * {syllable-break: auto;} /* n */
:script(Hani), :script(Jpan), :script(Kore)
{syllable-break: avoid;} /* default */
* {word-break: break-all;}
~> * {word-break: character;} /* 1 */
~> * {syllable-break: auto;} /* n */
Yeah, well, it's not perfect at all, I just wanted to provide
something to think about.
Received on Friday, 10 April 2009 19:14:43 UTC