Re: Soft hyphen (Re: Cougar comments)

Martin J. Duerst (mduerst@ifi.unizh.ch)
Mon, 12 May 1997 11:16:08 +0200 (MET DST)


Date: Mon, 12 May 1997 11:16:08 +0200 (MET DST)
From: "Martin J. Duerst" <mduerst@ifi.unizh.ch>
To: Jukka Korpela <jkorpela@cc.hut.fi>
cc: www-html@w3.org, unicode@unicode.org,
Subject: Re: Soft hyphen (Re: Cougar comments)
In-Reply-To: <Pine.OSF.3.96.970512081615.31014A-100000@alpha.hut.fi>
Message-ID: <Pine.SUN.3.96.970512105550.245L-100000@enoshima>

[[[[
I have added the iso10646 list to this discussion to reach the
ISO experts on characters. The discussion is about whether the
ISO character SOFT HYPHEN is to be interpreted as:

a) a hyphenation point, not displayed in the middle of the line,
	but displayed as a hyphen when the line is broken there.
b) like a hyphen in all cases, even if it turns up in the middle
	of a line.

The document http://www.hut.fi/%7ejkorpela/shy.html advocates
the second position. RFC 2070 for HTML i18n is based on the
first position.
]]]]

On Mon, 12 May 1997, Jukka Korpela wrote:

> On Sun, 11 May 1997, Martin J. Duerst wrote:
> 
> > Your understanding of the character U+00AD as a code that is always
> > visible is based on one sentence in section 6.3.3 of ISO 8859-1:
> > 
> > 	A graphic character that is imaged by a graphic symbol identical
> > 	with, or similar to, that representing HYPHEN, for use when
> > 	a line break has been established within a word.
> 
> That "one sentence" is the one and only sentence in the standard which
> describes the appearance and purpose of soft hyphen. Well, there _is_
> another sentence, which is remotely related. It is the second sentence
> in section 7 and says: 'None of these characters in "non-spacing"'.
> Can you possibly interpret it in a manner which is in contradiction
> with the "one sentence" above? Or could you interpret the definition
> of "graphic character" (in 5.5) so that we should ignore the words
> "has a visual representation normally handwritten, printed or
> displayed"?

The distinction of graphic vs. control characters is difficult in
some cases. NBSP, SPACE, TAB, and so on are examples. Non-spacing
means it's not a character that's "typed over" another, e.g. as an
accent. This is the case for SHY in both interpretations.


> > Now it is rather strange that one should have two hyphens (HYPHEN-MINUS
> > and SOFT HYPHEN) that are always visible. - -
> 
> I am not aware of (still less responsible for) the motivation behind
> selecting the character repertoire. I could also say that it is rather
> strange that one should have two blanks (SPACE and NO-BREAK SPACE).

Definitely not. In many cases, you want to be able to have a space
where the line is not broken. An examlpe are the spaces in
"Mr. J. Korpela".

> > Your main problem seems to be the meaning of the words "has been
> > established". If at some source of text encoding (or on the server
> > side in modern web technology), somebody establishes that there
> > may be a line break - -
> 
> My English is far from perfect, but I think I do know the meaning
> of "has been established". By your use of "may be", you seem to imply
> the presence of a word like "possible" before the words "line break"
> in the specification. Line breaks and possible line breaks within words
> are quite different things (points where hyphenation has actually
> taken place versus allowable hyphenation points).

The text doesn't say that the line break that has been established
is an actual line break. The fact that the word "possible" is not
present doesn't mean that this interpretation is wrong.


> > That the SHY is indeed only displayed if it turns up to lie at the
> > end of a line of rendered text is further supported by the fact
> > that ISO 10646 as well as the ISO/ECMA registrations and probably
> > even the ISO-8859-1 original write "SHY" and not "-" in the
> > appropriate location in the code charts.
> 
> Are you suggesting that a notational feature should be interpreted
> so that it cancels a definite verbal statement in the prescriptive
> part of the standard?

Well, the chart itself is probably as prescriptive as the verbal
statement. And because the verbal statement is not all that clear,
the chart may very well help to clarify things. By the way, the
Unicode book is even clearer on this, as it mentions "discretionary
hyphen" in the comments for SOFT HYPHEN.

> (I see little reason to wonder the way soft
> phyphen is presented in the chart. The specification says that graphic 
> symbol used to image soft hyphen is identical with or similar to hyphen.
> To me, the existence of two possible presentations justifies well
> the use of symbolic notation in the chart.)

Well, there wouldn't have been a need to use "SHY" in the chart
if the character always had to appear and look identical or similar
to hyphen. A simple hyphen-like glyph would have been okay. The
standards all clearly say that the exact shape of the glyph prited
in the chart is not relevant.

Regards,	Martin.