Re: breaking with nonspace characters

Carl Morris (msftrncs@htcnet.com)
Mon, 23 Sep 1996 07:51:34 -0500


Message-Id: <199609231255.HAA04616@inet.htcnet.com>
From: "Carl Morris" <msftrncs@htcnet.com>
To: "Peter Flynn" <pflynn@curia.ucc.ie>
Cc: "WWW HTML List" <www-html@w3.org>
Subject: Re: breaking with nonspace characters
Date: Mon, 23 Sep 1996 07:51:34 -0500

| No, this would be quite wrong: how on earth can you know how wide my
| browser window is or what size font I'm using? If you look carefully,
| you will see that it is much better to keep the breakpoint symbol at
| the end of the line, so that the reader can see that the string is to
| be continued. Finishing the line with htcnet.com makes it very
ambiguous.

Howed I know?  Simple, I wrote this as an email message meanning that
you have no choice, I my mail program wraps at 72 characters!  HTML on
the otherhand wraps at each users terminal size, THATS WHY I NEED TO
SUGGEST WRAP POINTS!

| It's all been done and documented, it just needs some browser to
| implement it. 

Where can I maybe find this ... I know you mentioned lots of names
above... but I don't remember seeing any sources...

| These are called discretionary hyphens. They differ from soft hyphens
| (places where breaking is allowed) and hard hyphens (hyphens where
| breaking would be foolish, such as "P-segment") in that discretionary
| hyphens disappear if not used for a break. There seems to be no
| provision in the ISO character entity sets for this, but there's
| nothing to prevent HTML defining (for example) &dhy; to do the job:

Yes there is, one called SHY...  MSIE even supports it, but HTML 3.2
doesn't yet define it although its mentioned in the spec...

| 
| Su-per-cal-i-frag-i-lis-tic-ex-pi-al-i-do-cious is given in Random
| House's _Unabridged Dictionary_ and cited in Appendix H of Knuth's
| _TeXbook_ (where the hyphenation algorithm is explained). This would
| give
|
Su&dhy;per&dhy;cal&dhy;i&dhy;frag&dhy;i&dhy;lis&dhy;tic&dhy;ex&dhy;pi&dh
y;al&dhy;i&dhy;do&dhy;cious
| :-) What I can't understand is some browsers reinventing the
| wheel. When it's so easy to do it right, why take such infinite
| trouble to get it wrong?

I don't understand you here...  The most I have seen is browsers that
will act upon normal hyphens which can sometimes be breaks...  I don't
see where any automatic algorythem could be used on english words
without some risk of screwing up either...