W3C home > Mailing lists > Public > whatwg@whatwg.org > January 2007

[whatwg] Hyphenation

From: Řistein E. Andersen <html5@xn--istein-9xa.com>
Date: Fri, 12 Jan 2007 01:15:33 +0100
Message-ID: <E1H5A4z-0006nc-00@ws7.ou-data.net>
On 11 Jan 2007, at 5:33PM, H?kon Wium Lie wrote:

> The term "hypenation dictionary" is quite common, but I see your
> point. What would be a better name for the property?

>  hyphenation-pattern
>  hypenation-list
>  hypenation-resource

Liang's paper `Word Hy-phen-a-tion by Com-put-er', in which the concept
was first introduced, used the term `hyphenation patterns'. Unsurprisingly,
Liang's supervisor, Knuth, used the same term in the TeXbook, and this
expression seems to have become the generally accepted one amongst TeX users.

`Hyphenation dictionary' is also common, but this tends to mean something
slightly different. To exemplify, the first five lines of what I would call a
hyphenation dictionary looks like this:
> a cap?pel?la
> a for?ti?o?ri
> a go?go
> a pos?te?ri?o?ri
> a pri?o?ri

[Interestingly, this particular dictionary contains multi-word expression, but
most hyphenation engines, as well as spelling checkers, cannot take advantage of
these, as each word (according to some definition) is typically treated in isolation.]

In contrast, the first five hyphenation patterns in TeX82 are the following:
> .ach4
> .ad4der
> .af1t
> .al3t
> .am5at

It think it is useful to keep the distinction and would suggest to rename the
property in question `hyphenation-patterns'. (TeX's exception dictionary
falls within this narrower definition of a hyphenation dictionary.)

http://computing-dictionary.thefreedictionary.com/hyphenation says:
> HYPHENATION: Breaking words that extend beyond the right margin.
> Software hyphenates words by matching them against a hyphenation
> dictionary or by using a built-in set of rules, or both.

http://www.answers.com/topic/hyphenation-dictionary is more specific:
> HYPHENATION DICTIONARY: A word file with predefined hyphen locations.

http://www.computeruser.com/resources/dictionary/definition.html?lookup=2188
gives a more generic definition:
> A file, usually in a word processing or desktop publishing program,
> which defines where hyphens will be placed for common words.

Google returns about 21,200 results for /hyphenation dictionar(y|ies)/ and
148,100 for /hyphenation patterns?/, so the latter should also be fairly common.

To me, a `hyphenation list' suggests something rather like a hyphenation
dictionary, whereas `hyphenation resource' probably should be reserved
for a more comprehensive source of hyphenation information ? unless
the same property is supposed to be able to refer to different kinds
of hyphenation data.


>>>> [In TeX], hyphenation can [also] be indicated locally.
>>>> This is needed in order to hyphenate words like
>>>> rec-ord/re-cord and is the only level that deals with
>>>> spelling changes.

> &shy; is probably the best way to encode this. However, it can be done
through CSS as well:

>    Dont's wait for <span style="hypenation-dictionary: rec-ord.dic">record
>    </span> companies, <span style="hypenation-dictionary: re-cord.dic">
>    record</span> yourself.

Right, I did not get your point at first. This does indeed cover the first reason
to use explicit mark-up in TeX.

Concerning spelling changes, Petr Sojka's `Notes on Compound Word
Hyphenation in TeX' [1], section 3.2, describes how a minimally extended
version of the TeX algorithm can deal with irregular hyphenation without any
extraneous mark-up, i.e., without any unnecessary burden on the author.
Perhaps an idea for Prince7?

Anyway, the preliminary conclusion seems to be that a <hyph> element in HTML
is unnecessary, so this discussion should probably continue somewhere else.

[1] http://www.fi.muni.cz/usr/sojka/papers/tug95.pdf

-- 
?istein E. Andersen
Received on Thursday, 11 January 2007 16:15:33 UTC

This archive was generated by hypermail 2.4.0 : Wednesday, 22 January 2020 16:58:51 UTC