- From: Ian Graham <igraham@smaug.java.utoronto.ca>
- Date: Thu, 17 Apr 1997 13:28:25 -0400 (EDT)
- To: abigail@fnx.com
- Cc: www-html@w3.org, www-style@w3.org
Some thoughts, largely in agreement with and following up on
Abigail's comments:
1) Using external hyphenation dictionaries requires a standardized
hyphentation format. Is there such a standard? (I know only
of the TeX/LaTeX approach) This would seem to be the first
necessary step.
2) Hyphenation is language specific. Whatever mechanism is used
will need to take this into account. Dictionaries must therefore
express, in a machine-readable way, the language to which they
apply.
2) A hyphentation dictionary is not purely language specific.
Overrides for specific contexts (techncal vocabulary,
special meanings/uses) will be required. "Cascading" external
hyphenation dictionaries might be a logical solution to this
problem. However, this would also require a document-specific
mechanism for indicating hyphenation rules, since authors would
probably specify document-specific hyphenation in the document,
and not in a separate file. This implies new markup.
3) A general-purpose dictionary is likely to be large, and slow to
download -- it would thus be better if the browser had default
hyphentation (dictionary or algorithm). Then, external dictionaries
could be used when (a) there is no local one (e.g., other languages),
or (b) when the author needs to explicitly set hyphenation rules
for special words. (b) again requires a markup- or stylesheet-specific
way of specifying local hyphenation rules.
4) If there is a local default dictionary, an author may wish to
override it using a specific, external dictionary.
5) Commercial dictionaries. Certainly there must be commercial
hypyenation tables -- how do I use these (and pay for them?)
6) Hyphenation servers? Another option is to have the browser
contact a hypyenation server, pass it a list of words that need/
are likely/ to be hypenated, and receive in return a list of
hyphentation rules. The rules could be cached for future use.
This would be faster than downloading an entire dictionary.
7) The reader should be able to turn all this off, and disable hyphenation
completely.
Items 1--4 and 7 might be accomplished in HTML using cascaded LINK
elements and local markup -- to paraphrase Dave Raggett and Abigail:
<HEAD>
<LINK REL=hyphenation LANG=en HREF="hyphen.dict">
<LINK REL=hyphenation LANG=en HREF="hyphen-tech.dict">
<LINK REL=hyphenation LANG=en HREF="hyphen-mystuff.dict">
<HYPHENATE LANG="en" WORD="foobar" HYPHENATED = "foo\-bar">
<HYPHENATE LANG="en" WORD="mayonnaise" HYPHENATED = "mayo\-nna\-ise">
...
</HEAD>
Ian
--
Ian Graham ................................. ian.graham@utoronto.ca
Centre for Academic Technology
Information Commons Tel: 416-978-4548
University of Toronto Fax: 416-978-7705
..................... http://www.utoronto.ca/ian/ .................
Abigail wrote:
>
> You, Dave Raggett, wrote:
> ++
> ++ On Thu, 17 Apr 1997, Vincent QUINT wrote:
> ++
> ++ > A full dictionary for each language would be too much expensive.
> ++ > Some time ago (in 1983) F. M. Liang proposed a very efficient
> ++ > method for compressing hyphenation dictionaries while making them
> ++ > much easier to search. This method is used in TeX and it produces
> ++ > quite good results with very small dictionaries. This is also the
> ++ > method used in Amaya.
> ++
> ++ Its always good to build on proven implementation experience.
> ++ The question remains as to how to link to such dictionaries.
> ++ One idea is to use LINK e.g.
> ++
> ++ <LINK REL=hyphenation LANG=en HREF=hyphen.dict>
>
> Somehow, this suggests user agents have to download complete
> dictionaries for a document. I don't think a dictionary on how to
> hyphenate words is a property of the document, but of the language. I
> just want to download a dictionary for English once, and not
> everyone's local copy. Of course, there will always be exceptions,
> names, new words, etc. But making a new dictionary which basically is
> a copy with some additions is a huge waste of resources; specially if
> you realise the exceptions might not even need to be hyphenated.
> Therefore I think the author needs to have the possibility to mark
> exceptions in the document, and hence leaving the bulk to the user
> agent. For instance:
>
> <HYPHENATE WORD = "foobar" HYPHENATED = "foo-bar">
>
> In that case, you only need to mark your exception once per
> document, and you can still use 'foobar' in your actual text.
> ­ doesn't seem to degrade gracefully on some browsers, and
> you need to type foo­bar for every occurance of foobar.
>
> ++ Another is to extend CSS with a hyphenation property, e.g.
> ++
> ++ BODY {hyphenation: url(hyphen.dict)}
>
> This has the same problem as mentioned above.
>
>
>
> Abigail
>
Received on Thursday, 17 April 1997 13:28:58 UTC