Re: Hyphenation (was Re: A suggested tag)

Some thoughts, largely in agreement with and following up on
Abigail's comments:

1) Using external hyphenation dictionaries requires a standardized
   hyphentation format.  Is there such a standard? (I know only
   of the TeX/LaTeX approach) This would seem to be the first
   necessary step. 

2) Hyphenation is language specific. Whatever mechanism is used
   will need to take this into account. Dictionaries must therefore
   express, in a machine-readable way, the language to which they 

2) A hyphentation dictionary is not purely language specific. 
   Overrides for specific contexts (techncal vocabulary, 
   special meanings/uses) will be required. "Cascading" external 
   hyphenation dictionaries might be a logical solution to this 
   problem. However, this would also require a document-specific 
   mechanism for indicating hyphenation rules, since authors would
   probably specify document-specific hyphenation in the document,
   and not in a separate file. This implies new markup.

3) A general-purpose dictionary is likely to be large, and slow to 
   download -- it would thus be better if the browser had default 
   hyphentation (dictionary or algorithm). Then, external dictionaries 
   could be used when (a) there is no local one (e.g., other languages), 
   or (b) when the author needs to explicitly set hyphenation rules 
   for special words.  (b) again requires a markup- or stylesheet-specific 
   way of specifying local hyphenation rules.

4) If there is a local default dictionary, an author may wish to 
   override it using a specific, external dictionary. 

5) Commercial dictionaries. Certainly there must be commercial
   hypyenation tables -- how do I use these (and pay for them?)

6) Hyphenation servers? Another option is to have the browser
   contact a hypyenation server, pass it a list of words that need/
   are likely/ to be hypenated, and receive in return a list of 
   hyphentation rules. The rules could be cached for future use.
   This would be faster than downloading an entire dictionary.

7) The reader should be able to turn all this off, and disable hyphenation

Items 1--4 and 7 might be accomplished in HTML using cascaded LINK 
elements and local markup -- to paraphrase Dave Raggett and Abigail:

   <LINK REL=hyphenation LANG=en HREF="hyphen.dict">
   <LINK REL=hyphenation LANG=en HREF="hyphen-tech.dict">
   <LINK REL=hyphenation LANG=en HREF="hyphen-mystuff.dict">

   <HYPHENATE LANG="en" WORD="foobar" HYPHENATED = "foo\-bar">
   <HYPHENATE LANG="en" WORD="mayonnaise" HYPHENATED = "mayo\-nna\-ise">


Ian Graham ................................. ian.graham@utoronto.ca
Centre for Academic Technology
Information Commons                               Tel: 416-978-4548
University of Toronto                             Fax: 416-978-7705
..................... http://www.utoronto.ca/ian/ .................

Abigail wrote:
> You, Dave Raggett, wrote:
> ++ 
> ++ On Thu, 17 Apr 1997, Vincent QUINT wrote:
> ++ 
> ++ > A full dictionary for each language would be too much expensive.
> ++ > Some time ago (in 1983) F. M. Liang proposed a very efficient
> ++ > method for compressing hyphenation dictionaries while making them
> ++ > much easier to search. This method is used in TeX and it produces
> ++ > quite good results with very small dictionaries. This is also the
> ++ > method used in Amaya.
> ++ 
> ++ Its always good to build on proven implementation experience.
> ++ The question remains as to how to link to such dictionaries.
> ++ One idea is to use LINK e.g.
> ++ 
> ++    <LINK REL=hyphenation LANG=en HREF=hyphen.dict>
> Somehow, this suggests user agents have to download complete
> dictionaries for a document. I don't think a dictionary on how to
> hyphenate words is a property of the document, but of the language. I
> just want to download a dictionary for English once, and not
> everyone's local copy. Of course, there will always be exceptions,
> names, new words, etc. But making a new dictionary which basically is
> a copy with some additions is a huge waste of resources; specially if
> you realise the exceptions might not even need to be hyphenated.
> Therefore I think the author needs to have the possibility to mark
> exceptions in the document, and hence leaving the bulk to the user
> agent. For instance:
> <HYPHENATE WORD = "foobar" HYPHENATED = "foo-bar">
> In that case, you only need to mark your exception once per
> document, and you can still use 'foobar' in your actual text.
> &shy; doesn't seem to degrade gracefully on some browsers, and
> you need to type foo&shy;bar for every occurance of foobar.
> ++ Another is to extend CSS with a hyphenation property, e.g.
> ++ 
> ++    BODY {hyphenation: url(hyphen.dict)}
> This has the same problem as mentioned above.
> Abigail