- From: Jukka K. Korpela <jkorpela@cs.tut.fi>
- Date: Thu, 9 Aug 2007 17:36:27 +0300 (EEST)
- To: www-style@w3.org
On Thu, 9 Aug 2007, Niklas Åkerlund wrote: > I figure automatic Hyphenation is a hard thing to implement. It is, but it has been implemented in many programs for several languages. There are problems of varying difficulty, depending on language and on the goals, i.e. the desired quality of hyphenation. > basically need to check each word that's about to wrap if it can be > broken up. And if so, keep the piece(s) that fits before wrapping. Well, yes. > To do that, you'd need a dictionary/list(s) of all words in the > specific language(s) used that can be broken up and where the break(s) > should be. No, this depends on language, and you cannot cover all the words in a language, since languages have potentially infinite number of words. For some languages, hyphenation can be performed mostly algorithmically (like "break before the last consonant in a consonant cluster inside a word") though special cases may require special treatment. On the other hand, for some languages, reasonable results can be performed using a relatively short list of common long words. We do not hyphenate everything; just breaking some long words might be OK, if words are generally short. If hyphenation should consider _all_ the possible hyphenation points, then things get difficult. Things get even more difficult if typographic quality should be considered too, e.g. the principle of avoiding hyphenation that leaves just a few letters on the last line of a paragraph. Moreover, different possible hyphenation points have different acceptability; e.g., a compound word should primarily be hyphenated at the component boundary. There's little one can say about such issues in CSS even in principle, though imaginably there might be a property that indicates the desired quality of hyphenation. (Asking for best quality is not always best, since quality may have a high cost in terms of processing time, especially if it implies that the browser needs to download extra software over the network.) > I'd rather suggest that browsers contains built in dictionaries, That's not feasible. Browsers should be able to invoke language-specific hyphenation software, and they could even use plugins loaded from the net. > Manual(or server side implemented) would be alot easier to implement > for browsers. Text would simply contain hyphenations from the start. That's possible, of course. You can do that now if you like and you don't worry about browsers (mainly Firefox) that don't implement SOFT HYPHEN yet. It's been possible for a long time. How many authors have used this possibility? Not too many. And it has practical problems at present, since e.g. Google does not seem to handle the soft hyphen properly: Google effectively treats it as word separator. > However, to keep the specs consistent, a hyphenation character like > u+200B should only show in preformated text. Just like the newline > character. The U+200B character, ZERO-WIDTH SPACE (ZWSP), has nothing to do with hyphenation. It allows a line break without_ adding any kind of a hyphen or other indicator at the end of a line. It's in theory suitable for making non-word strings like URLs or some formulas breakable. > In html, the BR tag breaks lines. In accordance to this, Mozilla and > IE implements the non-standard WBR to provide break points inside > words. The <wbr> has nothing to do with hyphenation. It's like U+200B except that it mostly works and does no harm when it doesn't. -- Jukka "Yucca" Korpela, http://www.cs.tut.fi/~jkorpela/
Received on Thursday, 9 August 2007 14:38:30 UTC