- From: Jukka K. Korpela <jkorpela@cs.tut.fi>
- Date: Mon, 21 Apr 2003 22:24:44 +0300 (EEST)
- To: www-style@w3.org
On Mon, 21 Apr 2003, fantasai wrote: > # In the most general case, (assuming no hyphenation dictionary is > # available to the UA), a line break can occur only at white space > # characters or hyphens, including U+00AD SOFT HYPHEN. > > This doesn't seem to match UAX 14. In what sense? UAX 14 is complex and confusing, and too implicit in some basic statements, but the reasonable interpretation is that it defines _default_ line breaking rules for characters. The rules _permit_ line breaks at certain points but do not require any particular behavior. Unfortunately, some software vendors often take those rules literally, causing line breaks even in strings like a-b or even -b (literally!), but it would not be adequate to blame UAX 14 on that. Surely the idea is that the default rules can be applied with discretion, using various criteria to prevent line breaks where UAX 14 would allow them, and applying additional line breaking principles when adequate. And CSS is a "higher level protocol" which can override any character-level rules. Since UAX 14 exists, it would probably be useful to have the _option_ of suggesting UAX 14 rules in CSS, but they should surely not be the default. As I discuss at http://www.cs.tut.fi/~jkorpela/unicode/linebr.html the UAX 14 rules are far too mechanical to from a sound basis for general text processing and display. Just breaking a line at some point, with no indication of what has happened, doesn't do good to many constructs that are actually used on Web pages. > line-break-general > normal - as defined in UAX 14 for non-ideographic > strict - only break on spaces and other explicit opportunities like zwsp > anywhere - as for "word-break-cjk: break-all" Presumably "normal" is supposed to be the initial value, and I strongly disagree. What you describe as "strict" is what dominated on the Web for years and is easily understood, except for the zwsp part. It should be the default, and the UAX 14 based method should have a name that clearly reflects its definition, like "unicode-line-breaking". And for practical reasons, a value (e.g., "after-hyphen") that allows line breaks after hyphen-minus characters and is otherwise identical with the default would be useful. In principle, it would be nice to have the option of explicitly enumerating the characters after which a line break is permitted. If you need to include a URL literally into a document, you might use some delimiters like "<" and ">" and permit line breaks after some characters like "/", "?", and "&" but not others. Besides, when word division by language-dependent algorithms becomes a reality in browsers, it becomes important to be able to prevent them. It's an interesting question whether they should be allowed by default. I would say no, both by Web traditions and by the fact that the algorithms won't work perfectly - they are more or less bound to create wrong hyphenations at times. This is more serious than in text processing where the author can, in principle at least, check what happens, whereas when CSS is used, the normal situation is that the author is nowhere near when actual document formatting for presentation takes place. Thus, an author should have the opportunity of asking for hyphenation, _if_ he regards it as useful enough for his document, considering all the pros and cons, and with regard to the need to add detailed markup and explicit hyphenation information, if the author wishes to prevent bad hyphenations. But first and foremost, CSS development should not encourage wider application of UAX 14 line breaking without the author's (or, in some cases, the user's) discretion. -- Jukka "Yucca" Korpela, http://www.cs.tut.fi/~jkorpela/
Received on Monday, 21 April 2003 15:24:46 UTC