W3C home > Mailing lists > Public > www-html@w3.org > January 1999

Hyphenation (to Jukka Korpela)

From: Aristeu E B da Silva <aristeu@mandic.com.br>
Date: Thu, 21 Jan 1999 21:11:04 -0500 (EST)
Message-ID: <000a01be45ac$6da872c0$01d6a8c0@pradesh>
To: <jkorpela@cc.hut.fi>
Cc: <www-html@w3.org>
Hi Yucca

[Sorry about the (pseudo)HTML message, I hope this one looks better.]

I liked very much your <hy> tag solution, and I'll try to convince you about
the importance of it.

First, some considerations to reinforce the point.

There is things we know, that came from very old works and discoveries. The
profession and art of producing printed information is plenty of this kind
of knowledge. Computer based communication has been, in my opinion, 'lazy'
sometimes, in getting hold of this heritage, most of the time it is
technologically justified, but often it's not. Some will say, even today,
that monospaced fonts does the same job with less effort. Less effort is
true, but the same job is not.

The point here is Readability, or efficiency in reading, or speed and
comfort too. Beauty is a consequence.

One of those early knowledge is that readability is always impaired if you
use columns of text containing an average number of characters per line
greater than about 50 and lesser than about 35. The two affirmative have
different reasons, the excess has to do with the lack of reference lines and
points to help the eye's movements.

Hyphenation is the fundamental resource to maintain adequate density of
words per area in those relatively thin columns, and to allow
'justification', which constructs a second vertical line in the column.

You may see that, a great number of web designers 'runs away' from the
one-column-standard that, for sure, is well suited to the variable width
medium that the web is, at least, it's easily suited . That wish cannot be
'deprecated', or seen as an not-ideal-way for using HTML. The means for
doing it is what needs to be improved.

HTM4.0 grasped some steps in that direction, but still lacks such an
important thing like a mean for building the 'chain-effect' that a real
column needs, mostly in a variable-size window. But that's another issue,
maybe in the CSS's realm (if I'm wrong about this lack, please point me).

Returning to the <hy> tag.

For sure it's better than &shy for backward compatibility. The search
engines might also 'prefer' this (I think). The point is that _allowing_ it
to be used the way I told. That is, everywhere in the hyphenated text
pointing at all possible hyphenation points (actually,  _ almost _ all is
enough), is what would leave open the more possibilities.

Have you ever seen how QuarkXPress behaves in that matter? As that in others
sophisticated tools like PageMaker and even Word, the soft-hyphen concept
isn't new, as you know. But they differs in approach. The best one, in my
opinion, is Quark's.

It has an hyphenation engine, a modular one and linked to the language,
which may be parameterized  as a whole, at document level, with attributes
like: "Hyphens in a row" (number of consecutive hyphens allowed ), "smallest
word", "Break Capitalized Words" and a few others.

The good thing is that IF the text has soft-hyphen typed in, these
parameters will work the same way as if they where engine-generated hyphens.

The engine OR the hard coded soft hyphens, says WHERE (in the word). The
parameterization says WHEN (in the column flow).

The other _ and wiser _ thing, is that, in the conflict cases with the
paragraphs' 'hyphenated' attribute (which are a Boolean and also points to
an specific language's engine). The hard-coded soft hyphens always prevail.

That allows your "re-cord / rec-ord" case. To break extremely large words in
non-hyphenated paragraphs AND, to perfectly hyphenate some text where the
installed engine would be unable.

I think that this proposed behavior would not do any harm, and at the same
time would coexist and complete the UA's hyphenation engine approach.

Finally, I read "http://www.hut.fi/u/jkorpela/shy.html" and what I can tell
is that's a greater mess then I imagined.

I Think that your interpretation isn't worst than any other. In fact there
was 'historical' fonts designs that used different shapes for hyphens
introduced by the 'composer' and the ones actually existing in the word.
But, I never saw a software that made that differentiation, and even digital
versions of 'Goudy Old Style', for instance, doesn't does. I account this
for one of those old things forever vanished.

PS. I just checked IE5 and, what a surprise, it does interprets &shy as a
hyphenation hint!!!!

Received on Tuesday, 26 January 1999 12:03:07 UTC

This archive was generated by hypermail 2.3.1 : Wednesday, 7 January 2015 15:05:49 UTC