Re: Soft hyphen (Re: Cougar comments)

Martin J. Duerst wrote:
> Your understanding of the character U+00AD as a code that is always
> visible is based on one sentence in section 6.3.3 of ISO 8859-1:
> 
> 	A graphic character that is imaged by a graphic symbol identical
> 	with, or similar to, that representing HYPHEN, for use when
> 	a line break has been established within a word.
> 
> Now it is rather strange that one should have two hyphens
(HYPHEN-MINUS
> and SOFT HYPHEN) that are always visible. Evere decent typographer
> or text coder would first ask for a dash. Also, if the hyphen were
> always to be shown, the word "SOFT" would be very difficult to
> explain. In addition, for a thing that is always shown, there would
> not be any need for a special explanation.

Call me indecent, but I disagree. A "soft" hyphen is a visible
character that is inserted by a text formatter after a line break
within a word has been established. In other words, when a text
formatter determines that a word will be broken and the second part
will begin a new line, the formatter inserts a soft hyphen after the
first part of the word rather than a "hard" hyphen. If the text is
later reformatted, the soft hyphen may be easily removed when it no
longer falls on a line break, whereas the "hard" hyphen is left in the
text regardless of its position.

Some arguably decent typographers and desktop publishers know that when
you send a soft hyphen to a printing or display device that supports
ISO 8859-1, the soft hyphen is imaged regardless of its position within
a line of text.

Here is a soft hy­phen. Does your mail reader support 8859? Did your
mail reader ignore the hyphen because it doesn't fall at the end of the
line? When you send the text to your printer, does the hyphen go away?
Should every current text editor, formatter and printing device be
declared obsolete because none contain built-in intelligence to deal
with the conditional display of certain "displayed" (as opposed to
"control") characters?

> That the SHY is indeed only displayed if it turns up to lie at the
> end of a line of rendered text is further supported by the fact
> that ISO 10646 as well as the ISO/ECMA registrations and probably
> even the ISO-8859-1 original write "SHY" and not "-" in the
> appropriate location in the code charts.

"Further supported"? First you assume a meaning for "soft", then
justify your premise based on the use of "shy" in the code charts. Is
this decent logic?

Within an HTML document there is markup and there is displayed text.
Blurring that distinction by associating conditions for the display of
a particular character based on that character's post-formatted
position would change the whole notion of displayed text -- a very bad
precedent at odds with current practice.

David Perrell

Received on Sunday, 11 May 1997 18:35:37 UTC