Re: Soft hyphen (Re: Cougar comments)

Martin J. Duerst (mduerst@ifi.unizh.ch)
Sun, 11 May 1997 17:23:09 +0200 (MET DST)


Date: Sun, 11 May 1997 17:23:09 +0200 (MET DST)
From: "Martin J. Duerst" <mduerst@ifi.unizh.ch>
To: Jukka Korpela <jkorpela@cc.hut.fi>
cc: www-html@w3.org, unicode@unicode.org
Subject: Re: Soft hyphen (Re: Cougar comments)
In-Reply-To: <Pine.OSF.3.96.970510195946.4897B-100000@alpha.hut.fi>
Message-ID: <Pine.SUN.3.96.970511170109.245F-100000@enoshima>

On Sat, 10 May 1997, Jukka Korpela wrote:

> Contrary to what seems to be common belief even among HTML experts,
> soft hyphen (as defined by ISO 8859-1) is _not_ a hyphenation hint
> comparable to invisible hyphen in text processing programs. See
>   http://www.hut.fi/%7ejkorpela/shy.html
> for more detailed discussion.

I have read your text at the above URL. I am crossposting to the
Unicode list, the most active list with most experts on character
coding on it. There seems indeed to be a misunderstanding. It could
have been resolved if you had contacted us (the authors of RFC 2070)
directly instead of just writing lengthy web pages. As far as I
understand, the misunderstanding lies on your side.

Your understanding of the character U+00AD as a code that is always
visible is based on one sentence in section 6.3.3 of ISO 8859-1:

	A graphic character that is imaged by a graphic symbol identical
	with, or similar to, that representing HYPHEN, for use when
	a line break has been established within a word.

Now it is rather strange that one should have two hyphens (HYPHEN-MINUS
and SOFT HYPHEN) that are always visible. Evere decent typographer
or text coder would first ask for a dash. Also, if the hyphen were
always to be shown, the word "SOFT" would be very difficult to
explain. In addition, for a thing that is always shown, there would
not be any need for a special explanation.

Your main problem seems to be the meaning of the words "has been
established". If at some source of text encoding (or on the server
side in modern web technology), somebody establishes that there
may be a line break between "re" and "cord" in the word "record",
then that's the place where to put an SHY.

That the SHY is indeed only displayed if it turns up to lie at the
end of a line of rendered text is further supported by the fact
that ISO 10646 as well as the ISO/ECMA registrations and probably
even the ISO-8859-1 original write "SHY" and not "-" in the
appropriate location in the code charts. In ISO 10646, there is also
the dashed box around "SHY", typical of characters with special
behaviour.


Regards,	Martin.