- From: Christoph PŠper <christoph.paeper@crissov.de>
- Date: Fri, 10 Apr 2009 21:14:51 +0200
- To: CSS 3 W3C Group <www-style@w3.org>
While the |word-break| property is at risk for level 3 of the CSS Text module <http://www.w3.org/TR/css3-text/#line-breaking> / <http://dev.w3.org/csswg/css3-text/#line-breaking> and hyphenation, currently located in the all-purpose module CGPM <http://www.w3.org/TR/css3-gcpm/#hyphenation> / <http://dev.w3.org/csswg/css3-gcpm/#hyphenation>, is quite open to discussion, I would like to raise some points. == Kinds of word breaking == The current hyphenation proposal distinguishes in its |hyphens| property three kinds of hyphenation: never ('none'), as indicated ('manual') and by algorithm or database ('auto'). For some languages, such as German, or scripts, probably CJK, there is a state between 'manual' and 'auto' that would be useful: break compounds. It is less complex to implement than a fully syllabic or morphemic algorithm (at least for alphabetic scripts) and it sometimes is the preferred style in titles or headings. Zeilen-|Trennung manual break point (|) Zeilen<ZW>|trennung manual break point Zeilen<SH>átrennung manual hyphenation point (á) Zeilentrennung lexemic, keep words together, no hyphenation or breaks Zeilenátrennung sememic, hyphenate compounds Zeiláenátrennáung morphemic, hyphenate at grammatical boundaries Zeiálenátrenánung syllabic, hyphenate at articulatory boundaries Z.ei.l.e graphemic, keep diphthongs, ligatures etc. together Z.e.i.l.e graphetic, split between characters/letters Z.e.ő.ˇ.l.e glyphic, decompose characters if possible/ necessary Morphemic and syllabic should be mutually exclusive and usually a language uses either one or the other, some do not make a clear choice, though. Manual indications and compound boundaries are usually still preferred break points -- a hierarchy can be generalized through all levels. Breaking at graphemes or graphs (glyphs) is a last resort for alphabetic scripts and usually is rather done with non- or para-words which may prefer splitting over hyphenating. Breaking or splitting can be seen as hyphenation without visible hyphen, or the other way around. These types (or a sensible selection thereof) could be mapped onto separate properties (|*-break| perhaps) or onto respective values of one property (|hyphenation|, |word-break| or whatever). == CJK word breaking == Contrary to intuition |word-break| seems only ever useful if one is writing text with east-asian square "morphograms", perhaps with intertwined alphabetic words. Maybe it is also useful for stuff like URLs inside alphabetic text, but that rather seems the domain of | text-wrap| and |word-wrap|. Several other properties only make sense for (European) alphabetic scripts, so that is not an issue by itself, but perhaps it would be better to do this kind of script-dependent styling with the |:lang()| or a new |:script()| pseudo-class selector instead (using ISO 15924 four-letter or three-digit codes). CJK ----------------------------------- ! strict ! loose ! |--------+----------------+----------------+ ! strict ! 'normal' | 'loose' | Other ! ! ('keep-all') | | scripts |--------+----------------+----------------+ ! loose ! 'break-strict' | 'break-all' | |--------+----------------+----------------+ Table 1: Current draft for |word-break| With |:script()| this could be simplified and be more flexible: * {word-break: normal;} => * {word-break: normal;} * {word-break: keep-all;} => * {word-break: normal;} :script(Hani), :script(Jpan) {word-break: strict;} /* if I understand the intention correctly */ * {word-break: loose;} => * {word-break: normal;} :script(Hani), :script(Jpan), :script(Kore) {word-break: loose;} * {word-break: break-strict;} => * {word-break: loose;} :script(Hani), :script(Jpan), :script(Kore) {word-break: normal;} * {word-break: break-all;} => * {word-break: loose;} If ISO 15924 introduced more general aliases based on script features (e.g. 'Logo' or 'Sylb' and 'Alph') the selectors could become easier and of course you could also write them the other way around using |:not()|. == General breaking == From an author's perspective it might be nice to have text breaking and hyphenation work similar to page (and column) breaking. We shall only be dealing with the "inside" variant here, so we may drop that name particle and inherit its values: 'auto' (~= allow) and 'avoid'. line-break: auto | avoid | none; /* ~= text-wrap, for whitespace treatment */ text-wrap: normal; ~> line-break: auto; text-wrap: suppress; ~> line-break: avoid; text-wrap: none; ~> line-break: none; text-wrap: unrestricted; ~> line-break: auto; character-break: auto; hyphenation: none; What is a /word/ in the CSS (or Unicode) sense? Any string of characters bordered by whitespaces or punctuation marks, any lexeme? word-break: none | manual | compound | _auto_ | syllable | character; or compound-break: _auto_ | avoid; /* sememic */ word-break: _auto_ | avoid; /* syllabic/morphemic */ syllable-break: auto | _avoid_; /* graphe(m/t)ic */ character-break: auto | _avoid_; /* often not possible at all */ Hyphenation control has to be set separately, e.g. which string if any (i.e. not split) to use. word-wrap: normal; ~> *-break: auto; word-wrap: break-word; ~> word-break: character; hyphenation: ""; /* 1 word break property */ ~> character-break: auto; hyphenation: ""; /* n break properties */ * {word-break: normal;} ~> * {word-break: syllable;} /* 1, default */ ~> * {word-break: auto; compound-break: auto;} /* n, defaults */ * {word-break: keep-all;} ~> * {word-break: none;} /* 1 */ ~> * {compound-break: avoid;} /* n */ * {word-break: loose;} ~> :script(Hani), :script(Jpan), :script(Kore) /* 1 */ {word-break: character;} ~> :script(Hani), :script(Jpan), :script(Kore) /* n */ {syllable-break: auto;} * {word-break: break-strict;} ~> * {word-break: character;} /* 1 */ :script(Hani), :script(Jpan), :script(Kore) {word-break: syllable;} /* default */ ~> * {syllable-break: auto;} /* n */ :script(Hani), :script(Jpan), :script(Kore) {syllable-break: avoid;} /* default */ * {word-break: break-all;} ~> * {word-break: character;} /* 1 */ ~> * {syllable-break: auto;} /* n */ Yeah, well, it's not perfect at all, I just wanted to provide something to think about.
Received on Friday, 10 April 2009 19:14:43 UTC