- From: Martin J. Duerst <mduerst@ifi.unizh.ch>
- Date: Sun, 11 May 1997 17:23:09 +0200 (MET DST)
- To: Jukka Korpela <jkorpela@cc.hut.fi>
- cc: www-html@w3.org, unicode@unicode.org
On Sat, 10 May 1997, Jukka Korpela wrote: > Contrary to what seems to be common belief even among HTML experts, > soft hyphen (as defined by ISO 8859-1) is _not_ a hyphenation hint > comparable to invisible hyphen in text processing programs. See > http://www.hut.fi/%7ejkorpela/shy.html > for more detailed discussion. I have read your text at the above URL. I am crossposting to the Unicode list, the most active list with most experts on character coding on it. There seems indeed to be a misunderstanding. It could have been resolved if you had contacted us (the authors of RFC 2070) directly instead of just writing lengthy web pages. As far as I understand, the misunderstanding lies on your side. Your understanding of the character U+00AD as a code that is always visible is based on one sentence in section 6.3.3 of ISO 8859-1: A graphic character that is imaged by a graphic symbol identical with, or similar to, that representing HYPHEN, for use when a line break has been established within a word. Now it is rather strange that one should have two hyphens (HYPHEN-MINUS and SOFT HYPHEN) that are always visible. Evere decent typographer or text coder would first ask for a dash. Also, if the hyphen were always to be shown, the word "SOFT" would be very difficult to explain. In addition, for a thing that is always shown, there would not be any need for a special explanation. Your main problem seems to be the meaning of the words "has been established". If at some source of text encoding (or on the server side in modern web technology), somebody establishes that there may be a line break between "re" and "cord" in the word "record", then that's the place where to put an SHY. That the SHY is indeed only displayed if it turns up to lie at the end of a line of rendered text is further supported by the fact that ISO 10646 as well as the ISO/ECMA registrations and probably even the ISO-8859-1 original write "SHY" and not "-" in the appropriate location in the code charts. In ISO 10646, there is also the dashed box around "SHY", typical of characters with special behaviour. Regards, Martin.
Received on Sunday, 11 May 1997 11:23:50 UTC