Re: breaking with nonspace characters

Peter Flynn (pflynn@curia.ucc.ie)
23 Sep 1996 09:23:42 +0100


Date: 23 Sep 1996 09:23:42 +0100
From: Peter Flynn <pflynn@curia.ucc.ie>
Subject: Re: breaking with nonspace characters
In-reply-to: <199609202242.RAA02588@inet.htcnet.com> (message from Carl Morris
To: msftrncs@htcnet.com
Cc: www-html@w3.org
Message-id: <199609230823.JAA28858@curia.ucc.ie>

   Is there any standard or proposed methods to break words apart between
   characters as if a space had been there only when needed ) (ie:
   suggest) and to suggest that a word could be broken with a hyphen?

You mean algorithms for hyphenated and non-hyphenated breaks? Easy.

Yes, this has been around for a _long_ time: the hyphenation algorithm
devised by Liang and implemented by Knuth. It also works for
non-hyphenated breaks, as implemented in the \verb (LaTeX) and \path
(eplain) macros due to Nelson Beebe and Phil Taylor. I remain
constantly amazed that this simple device is not more widely used as
it is very robust and easy to implement. Even WIRED keeps on breaking
URLs and email addresses _before_ a breakpoint symbol, eg http://www
.ucc.ie/foo/bar so that the breakpoint symbol falls at the beginning
of the line instead of at the end of the previous line: http://www.
ucc.ie/foo/bar

   I think you will find this looks pretty bad...  so I would like to
   suggest the browser to break like this (at aprox 60 columns):

No, this would be quite wrong: how on earth can you know how wide my
browser window is or what size font I'm using? If you look carefully,
you will see that it is much better to keep the breakpoint symbol at
the end of the line, so that the reader can see that the string is to
be continued. Finishing the line with htcnet.com makes it very ambiguous.

   information about ONEFOSsil.=A0 The address of this page is:
   http://home.aol.com/kiwi7416.=A0 We also have our own WEB page
   at http://199.120.83.179/msftrncs/ and at http://www.htcnet.com
   /~msftrncs/msftrncs/index.html.=A0 You may also get to the

   This look more pleasing?  Actually I think browsers could find a rule
   to use here, and do it themselves... what do you think?

It's all been done and documented, it just needs some browser to
implement it. 

   The other example...  spell out the word used in "marry poppens" (sure,
   its in the dictionary even I think...), thats pretty long winded...
   lets say it won't fit in a single line on the browser...  how can the
   browser be suggested of the proper points to break it, and place a
   hyphen there when it does?

These are called discretionary hyphens. They differ from soft hyphens
(places where breaking is allowed) and hard hyphens (hyphens where
breaking would be foolish, such as "P-segment") in that discretionary
hyphens disappear if not used for a break. There seems to be no
provision in the ISO character entity sets for this, but there's
nothing to prevent HTML defining (for example) &dhy; to do the job:

Su-per-cal-i-frag-i-lis-tic-ex-pi-al-i-do-cious is given in Random
House's _Unabridged Dictionary_ and cited in Appendix H of Knuth's
_TeXbook_ (where the hyphenation algorithm is explained). This would
give
Su&dhy;per&dhy;cal&dhy;i&dhy;frag&dhy;i&dhy;lis&dhy;tic&dhy;ex&dhy;pi&dhy;al&dhy;i&dhy;do&dhy;cious
:-) What I can't understand is some browsers reinventing the
wheel. When it's so easy to do it right, why take such infinite
trouble to get it wrong?

///Peter