- From: Ian Graham <igraham@smaug.java.utoronto.ca>
- Date: Thu, 17 Apr 1997 13:28:25 -0400 (EDT)
- To: abigail@fnx.com
- Cc: www-html@w3.org, www-style@w3.org
Some thoughts, largely in agreement with and following up on Abigail's comments: 1) Using external hyphenation dictionaries requires a standardized hyphentation format. Is there such a standard? (I know only of the TeX/LaTeX approach) This would seem to be the first necessary step. 2) Hyphenation is language specific. Whatever mechanism is used will need to take this into account. Dictionaries must therefore express, in a machine-readable way, the language to which they apply. 2) A hyphentation dictionary is not purely language specific. Overrides for specific contexts (techncal vocabulary, special meanings/uses) will be required. "Cascading" external hyphenation dictionaries might be a logical solution to this problem. However, this would also require a document-specific mechanism for indicating hyphenation rules, since authors would probably specify document-specific hyphenation in the document, and not in a separate file. This implies new markup. 3) A general-purpose dictionary is likely to be large, and slow to download -- it would thus be better if the browser had default hyphentation (dictionary or algorithm). Then, external dictionaries could be used when (a) there is no local one (e.g., other languages), or (b) when the author needs to explicitly set hyphenation rules for special words. (b) again requires a markup- or stylesheet-specific way of specifying local hyphenation rules. 4) If there is a local default dictionary, an author may wish to override it using a specific, external dictionary. 5) Commercial dictionaries. Certainly there must be commercial hypyenation tables -- how do I use these (and pay for them?) 6) Hyphenation servers? Another option is to have the browser contact a hypyenation server, pass it a list of words that need/ are likely/ to be hypenated, and receive in return a list of hyphentation rules. The rules could be cached for future use. This would be faster than downloading an entire dictionary. 7) The reader should be able to turn all this off, and disable hyphenation completely. Items 1--4 and 7 might be accomplished in HTML using cascaded LINK elements and local markup -- to paraphrase Dave Raggett and Abigail: <HEAD> <LINK REL=hyphenation LANG=en HREF="hyphen.dict"> <LINK REL=hyphenation LANG=en HREF="hyphen-tech.dict"> <LINK REL=hyphenation LANG=en HREF="hyphen-mystuff.dict"> <HYPHENATE LANG="en" WORD="foobar" HYPHENATED = "foo\-bar"> <HYPHENATE LANG="en" WORD="mayonnaise" HYPHENATED = "mayo\-nna\-ise"> ... </HEAD> Ian -- Ian Graham ................................. ian.graham@utoronto.ca Centre for Academic Technology Information Commons Tel: 416-978-4548 University of Toronto Fax: 416-978-7705 ..................... http://www.utoronto.ca/ian/ ................. Abigail wrote: > > You, Dave Raggett, wrote: > ++ > ++ On Thu, 17 Apr 1997, Vincent QUINT wrote: > ++ > ++ > A full dictionary for each language would be too much expensive. > ++ > Some time ago (in 1983) F. M. Liang proposed a very efficient > ++ > method for compressing hyphenation dictionaries while making them > ++ > much easier to search. This method is used in TeX and it produces > ++ > quite good results with very small dictionaries. This is also the > ++ > method used in Amaya. > ++ > ++ Its always good to build on proven implementation experience. > ++ The question remains as to how to link to such dictionaries. > ++ One idea is to use LINK e.g. > ++ > ++ <LINK REL=hyphenation LANG=en HREF=hyphen.dict> > > Somehow, this suggests user agents have to download complete > dictionaries for a document. I don't think a dictionary on how to > hyphenate words is a property of the document, but of the language. I > just want to download a dictionary for English once, and not > everyone's local copy. Of course, there will always be exceptions, > names, new words, etc. But making a new dictionary which basically is > a copy with some additions is a huge waste of resources; specially if > you realise the exceptions might not even need to be hyphenated. > Therefore I think the author needs to have the possibility to mark > exceptions in the document, and hence leaving the bulk to the user > agent. For instance: > > <HYPHENATE WORD = "foobar" HYPHENATED = "foo-bar"> > > In that case, you only need to mark your exception once per > document, and you can still use 'foobar' in your actual text. > ­ doesn't seem to degrade gracefully on some browsers, and > you need to type foo­bar for every occurance of foobar. > > ++ Another is to extend CSS with a hyphenation property, e.g. > ++ > ++ BODY {hyphenation: url(hyphen.dict)} > > This has the same problem as mentioned above. > > > > Abigail >
Received on Thursday, 17 April 1997 13:28:58 UTC