- From: Martin J. Duerst <mduerst@ifi.unizh.ch>
- Date: Tue, 13 May 1997 12:19:14 +0200 (MET DST)
- To: Otto Stolz <Otto.Stolz@uni-konstanz.de>
- cc: Multiple Recipients of <unicode@Unicode.ORG>, www-html <www-html@w3.org>, ISO 10646 mailing list <iso10646@listproc.hcf.jhu.edu>
On Tue, 13 May 1997, Otto Stolz wrote: > On May 12, 10:25, Mark Davis <mark_davis@taligent.com> wrote: > > You can insert a zero width no-break space, if you want to prevent a > > word-break at a particular point. > > This is not feasable. You cannot anticipate which weird points an > arbitrary browser (or some other rendering sogtware) might deem legal > hyphenating points. To be on the safe side, you would have to insert > those Z-WNBSPs between any two adjacent letters, thus almost doubling > the length of your text. > > Hence, the only feasable solution is: > - for the sender: mark all preferred hyphenating points, > - for any browsing, or rendering, software: do not hyphenate within > a word but at hyphenating points marked so by the sender. There are other possibilities. For example, you can language-tag your text (the discussion is, at least originally, about HTML) and hope for the receiver to know about hyphenation. You then only insert a SHY in places where the receiver can't possibly know (e.g. re-cord vs. rec-ord, word compositions as they occur in German and Nordic languages and so on). You can also add hyphenation points in otherwise very long words to help receivers that don't have a hyphenation engine for the respective language. The benefit of hyphenation points increases rather quickly with the length of a word. A usual convention for marking a word that should not be hyphenated is to prefix a SHY. > To mark the points of possible line-breaks: > - for languages that do hyphenation, SHY (U+00AD) seems the only > character suitable to mark hyphenation points (in spite of the > obfuscationg wording in ISO 8859-1); Indeed. The conclusion from the official description seems to be that a SHY was only intended to be inserted at the end of the line when the line break actually occurs. Because it was never supposed to appear inside a line, using it to denote a potential word break if it appears inside a line is only an extension of its use, and not directly against that wording in ISO 8859-1. And it's of corse the most reasonable and usable extension. > - for languages that don't use spaces as word-boundaries, a Z-WSp > (U+200B) seems suitable to mark the word boundaries. > Opinions? Yes. But you only need it for languages that indeed need to know word boundaries to do line breaking (such as Thai). You don't need it in cases such as Chinese and Japanese, where you can break the line between virtually all characters, and the exceptions can easily be determined. Regards, Martin.
Received on Tuesday, 13 May 1997 06:19:46 UTC