Re: A suggested tag

John Udall (jsu1@cornell.edu)
Mon, 14 Apr 1997 09:18:09 -0400


Message-Id: <2.2.32.19970414131809.0076d2dc@freedom.cce.cornell.edu>
Date: Mon, 14 Apr 1997 09:18:09 -0400
To: www-html@w3.org
From: John Udall <jsu1@cornell.edu>
Subject: Re: A suggested tag

At 04:54 PM 4/13/97 +0300,Jukka Korpela <jkorpela@cc.hut.fi> wrote:
>On Fri, 11 Apr 1997, Terje Norderhaug wrote:
>
>> A better idea would be a shared dictionary of words and how they are
>> splitted residing on the network.
>
>Hyphenation is a strongly language-dependent issue, so what we basically
>need is support to different languages (including the HTTP level features
>and the proposed LANG attribute at the HTML level). For example, in
>English documents hyphenation is usually not desirable, whereas in
>Finnish documents it is often crucial for good-quality presentation
>since words are often very long; and in Finnish hyphenation can mostly 
>be done on algorithmic basis (without dictionaries), and if high-quality
>hyphenation is desired, one really needs program which performs
>morphological analysis (in addition to using a dictionary). Most languages
>are probably somewhere between, but one should _not_ assume that
>dictionaries or explicit hyphenation by authors are are the universally
>correct approach.
>
        Jukka makes a very important point here. Hyphenation is very
strongly language dependant.  Finnish can be hyphenated  on an algorithmic
basis.  Some other languages can be hyphenated on the basis of rules, as
Abigail mentioned.  Others must use a dictionary, because there just aren't
any rules that cover all the cases. 

At 06:12 PM 4/11/97 -0400, Paul Prescod <papresco@calum.csclub.uwaterloo.ca>
wrote:
>How often do you read documents in more than 2 or 3 languages? How many
>do you speak? Why not just download and cache those dictionaries?

        Hyphenation should be tied to the LANG attribute rather than an
entire document.  Many people, for example students, work with documents
containing  2, 3 or even 4 languages. I speak 4 languages: French, German,
Czech and English.  This past year when I was studying Czech, our textbook
had examples in both Czech and Russian with explanations in English. I don't
think that this situation is at all uncommon.

         The point is that hyphenation is something that, for the most part,
should be carried out on the client side and should be automated.  This is
especially true for users of windowing browsers, where they can re-size the
window to a different width on the fly, requiring re-hyphenation of words,
and preferably not forcing a re-connection to the server and the page to be
reloaded across the network.

        Users need to be able to force hyphenation, using the &shy; entity
for example.  They also need to be able to turn off hyphenation for certain
spans of text, like for URLs or poetry or whatever.  Better support for
entities and for internationalization in general by the browser
manufacturers would be nice.  It is world wide web after all.

-John

>Yucca, http://www.hut.fi/~jkorpela/
>
>

John Udall,
      Programmer/Sys. Admin.
Extension Electronic Technologies Group (EETG),
Cornell Cooperative Extension,
40 Warren Hall
Cornell University,
Ithaca, New York 14853
(607) 255-8127
jsu1@cornell.edu