- From: Henri Sivonen <hsivonen@iki.fi>
- Date: Tue, 9 Jan 2007 20:22:28 +0200
On Jan 9, 2007, at 01:02, ?istein E. Andersen wrote: > In summary, hyphenation is a hard problem: breaking points cannot > in general > be established algorithmically; hyphenation dictionaries are not > always available > and typically do not contain long/rare/complex words (the ones that > really > need to be hyphenated); furthermore, distinct words may be spelt > identically, > but still need to be hyphenated differently; and several languages > require spelling > changes when words are hyphenated ([3] mentions Dutch, German (alte > Rechtschreibung), Spanish, Norwegian, Swedish and Hungarian). My initial thoughts: * Prince seems to be doing exactly the right thing: control overall hyphenation with CSS, honor soft hyphens and support TeX-compatible language-specific dictionaries. * The Swedish and Dutch examples given in this thread seem to be addressable with language-specific dictionaries. * Not knowing Dutch, the example makes me guess that the diaeresis in Dutch has the same meaning as in French (indicate that vowels don't form a diphthong). If this is the case, the interaction of the diaeresis with hyphenation may even be a generalizable rule that could be hard-coded in Dutch-aware hyphenating browsers. Is it a generalizable rule? * Knowing a bit Swedish, I really have a hard time taking seriously the notion of Swedish requiring new markup to be introduced to HTML. The sky won't fall if a browser doesn't know how to hyphenate Swedish chewing gum in the absence of a hyphenation dictionary. (Besides, it looks like the Swedish rule is generalizable so that a hyphenator wouldn't even need a list of all possible compound words but a dictionary of simple words that can be part of a compound would suffice.) * Not having a language-specific dictionary available in a browser doesn't make things worse than the status quo, so it isn't that big a deal. * Hand-coders wouldn't bother to type hyphenation data for everything every time. (TeX users run the typesetting step themselves whereas HTML is rendered elsewhere. TeX users only tend to micromanage the words that they see didn't typeset nicely.) * It is unlikely that authoring tools would opt to dump their hyphenation data in documents even if their data was in a format suitable for dumping in whatever format was required. * All the languages cited as requiring spelling changes are written using the Latin script. The Latin script has a long cultural tradition of adapting to writing technology: from chiseled marble to quills to movable type to typewriters to computer displays. Therefore, I don't find it unreasonable to suggest adapting to the limitations of the medium here. -- Henri Sivonen hsivonen at iki.fi http://hsivonen.iki.fi/
Received on Tuesday, 9 January 2007 10:22:28 UTC