Re: A suggested tag

BruceLeban@akimbo.com
Fri, 11 Apr 1997 21:59:35 -0400 (EDT)


From: BruceLeban@akimbo.com
Date: Fri, 11 Apr 1997 21:59:35 -0400 (EDT)
Message-Id: <199704120159.VAA19372@mail.internet.com>
To: www-html@w3.org
Subject: Re: A suggested tag


>> A suggested tag: something like <w>
>> for word split

>A soft hyphen entity, &shy;, was proposed in HTML 2.0, though I don't 
>think many (if any) browsers actually support it.  Some show a 
>hyphen, but I'm not aware of any that actually treat it as a soft 
>hyphen.

My favorite feature of &shy; is how unreadable it makes 
doc&shy;u&shy;ments that use them in browsers that don't recognize the 
entity. *If* this is something that was desireable to use as markup, then 
it should be a tag, not an entity. I tend to think that this is something 
the renderer of the document (i.e., browser) could deal with. There 
doesn't need to be a shared dictionary or anything. It uses my dictionary 
on my machine; I don't care what hyphens you see. :-)

>With the <w> solution there is the problem of determining word begin/end.
>	Com<w>
>	mu<w>
>	ni<w>
>	ca<w>
>	tor
>could not easily be determined as a single word. So to make it bullet proof
>word ending/beginnings would have to be tagged, too. E.g.
>	<word>Com<w>mu<w>ni<w>ca<w>tor</word>

Not really. Just because the word can be broken on display, doesn't mean 
it's allowed to be broken in the source. You could prohibit spaces in the 
middle of a word (a fairly common rule :-).

But you don't need to. "Foo <w> bar" can simply be defined to be a single 
word since there's no reason you would stick an optional hyphen at the 
beginning or end of a word.

Of course nothing is as simple as it seems at first. This might seem like 
a simple idea, but there are complications. If you're really doing this, 
you need something slightly more sophisticated to handle all hyphenation:

    ba<wb break="k-">c</wb>ken

which is a German word "backen" that hyphenates as bak- ken and

    republican/<wb break=""></wb>democrat

which is "republican/democrat" and broken as "republican/" and "democrat".
<w> is equivalent to

    <wb break="-"></wb>

Then you might also want specify precedence of breaks:

    hy<w rank=3>phen<w rank=1>a<w rank=2>tion

There may be other complexities I haven't thought of. Of course we don't 
need to worry about markup at all if the browser just handles it. Plus I 
get hyphenation on all those web sites that the authors didn't bother to 
markup.

    --- Bruce Leban
    Akimbo Systems
    http://www.akimbo.com/globetrotter
    Publish on the web without learning HTML! (Really.)