Re: Soft hyphen (Re: Cougar comments)

Jukka Korpela (jkorpela@cc.hut.fi)
Mon, 12 May 1997 08:42:50 +0300 (EET DST)


Date: Mon, 12 May 1997 08:42:50 +0300 (EET DST)
From: Jukka Korpela <jkorpela@cc.hut.fi>
To: "Martin J. Duerst" <mduerst@ifi.unizh.ch>
cc: Jukka Korpela <jkorpela@cc.hut.fi>, www-html@w3.org, unicode@unicode.org
Subject: Re: Soft hyphen (Re: Cougar comments)
In-Reply-To: <Pine.SUN.3.96.970511170109.245F-100000@enoshima>
Message-ID: <Pine.OSF.3.96.970512081615.31014A-100000@alpha.hut.fi>

On Sun, 11 May 1997, Martin J. Duerst wrote:

> - - There seems indeed to be a misunderstanding. It could
> have been resolved if you had contacted us (the authors of RFC 2070)
> directly instead of just writing lengthy web pages. - - 

As you may have noticed, I took a look at the published material,
the RFC 2070, as well as some other material which disagrees with it.
I did not contact personally any of the people with different views.
This issue seems to require public discussion. If it were my mis-
understanding, it certainly wouldn't be mine only.

I didn't know this was such a hot potato. My starting point was an
easily observable disagreement and confusion - the existence of
mutually incompatible claims about soft hyphen, from usually
well-informed sources.

> Your understanding of the character U+00AD as a code that is always
> visible is based on one sentence in section 6.3.3 of ISO 8859-1:
> 
> 	A graphic character that is imaged by a graphic symbol identical
> 	with, or similar to, that representing HYPHEN, for use when
> 	a line break has been established within a word.

That "one sentence" is the one and only sentence in the standard which
describes the appearance and purpose of soft hyphen. Well, there _is_
another sentence, which is remotely related. It is the second sentence
in section 7 and says: 'None of these characters in "non-spacing"'.
Can you possibly interpret it in a manner which is in contradiction
with the "one sentence" above? Or could you interpret the definition
of "graphic character" (in 5.5) so that we should ignore the words
"has a visual representation normally handwritten, printed or
displayed"?

> Now it is rather strange that one should have two hyphens (HYPHEN-MINUS
> and SOFT HYPHEN) that are always visible. - -

I am not aware of (still less responsible for) the motivation behind
selecting the character repertoire. I could also say that it is rather
strange that one should have two blanks (SPACE and NO-BREAK SPACE).
Anyway, speculative questions shouldn't make us ignore the written
specification which is quite clear.

> Your main problem seems to be the meaning of the words "has been
> established". If at some source of text encoding (or on the server
> side in modern web technology), somebody establishes that there
> may be a line break - -

My English is far from perfect, but I think I do know the meaning
of "has been established". By your use of "may be", you seem to imply
the presence of a word like "possible" before the words "line break"
in the specification. Line breaks and possible line breaks within words
are quite different things (points where hyphenation has actually
taken place versus allowable hyphenation points).

> That the SHY is indeed only displayed if it turns up to lie at the
> end of a line of rendered text is further supported by the fact
> that ISO 10646 as well as the ISO/ECMA registrations and probably
> even the ISO-8859-1 original write "SHY" and not "-" in the
> appropriate location in the code charts.

Are you suggesting that a notational feature should be interpreted
so that it cancels a definite verbal statement in the prescriptive
part of the standard? (I see little reason to wonder the way soft
phyphen is presented in the chart. The specification says that graphic 
symbol used to image soft hyphen is identical with or similar to hyphen.
To me, the existence of two possible presentations justifies well
the use of symbolic notation in the chart.)

Yucca, http://www.hut.fi/%7ejkorpela/