[whatwg] Hyphenation

On 11 Jan 2007, at 1:49PM, H?kon Wium Lie wrote:

> Prince doesn't support exception dictionaries. Is it not
> possible to encode exceptions in the hyphenation dictionary?

Yes, that should be possible, actually. The encoding of certain
words in a default exception dictionary seems to be a design
choice in TeX rather than a requirement. (By the way, the term
`dictionary' used to designate a set of hyphenation patterns that
are not, in general, words, is quite confusing.)

> DSSSL has an 'hyphenation-exceptions' property which takes a
> list of strings. I'm unsure if it has been implemented, though.

Interesting. This would be useful for authors who wanted to
indicate a few exceptions without specifying a complete set of
hyphenation patterns. (TeX includes 4,447 patterns, and two or
several sets cannot easily be merged.)

>> [In TeX], hyphenation can [also] be indicated locally.
>> This is needed in order to hyphenate words like
>> rec-ord/re-cord and is the only level that deals with
>> spelling changes.

> This can be done by supplying your own dictionary through the
> 'hyphenate-dictionary' property.

You seem to have misinterpreted the intended meaning of
`locally'. The two problems are as follows:

1) Given the following sentence: `Don't wait for record companies,
record records yourselves.' In order to hyphenate
this correctly, explicit hyphenation points (\- in TeX) must
be inserted locally, i.e., as part of the words, as follows:
`Don't wait for rec\-ord companies, re\-cord rec\-ords yourselves.'

2) TeX's hyphenation patterns cannot encode spelling changes;
neither can its exception dictionary.
Therefore, spelling changes like backen -> bak-ken must be
indicated explicitly each time the word occurs.

>> There are a few additional caveats. For instance, it is not entirely >> obvious what should be considered to be a `word' or which characters >> should be allowed in a `word' 
>> [... lots of less important points ...]
>> How does Prince deal with these issues?

> Prince6 does't try to go beyond Tex.

Fair enough. I realise that my question ended up rather too far away from the most important issue. I suppose Prince relies on Unicode character classes to identify letters (which is better than Plain TeX's default [unaccented English letters only], but less flexible) and uses a special rule to treat hyphens. Is this a correct assumption? Can I find more information on such details somewhere?

-- 
?istein E. Andersen

Received on Thursday, 11 January 2007 07:25:05 UTC