Re: &nbhy; from Jukka Korpela on 1998-09-22 (www-html@w3.org from September 1998)

From: Jukka Korpela <jkorpela@cc.hut.fi>
Date: Tue, 22 Sep 1998 08:57:57 +0300 (EET DST)
To: www-html@w3.org
Message-ID: <Pine.OSF.3.96.980922082815.10517A-100000@beta.hut.fi>
On Mon, 21 Sep 1998, Walter Ian Kaye wrote:

> At 7:31a -0700 09/21/98, Marco Bleeker wrote:
> > A proposal for a new entity in the html charcter set:  
> > 
> > no-break-hyphen --> &nbhy;
> > 
> > Since modern browsers use a hyphen for a possible line break
> > (something I had proposed myself a few years ago), I have now felt the
> > need for a no-break-hyphen. In my biology pages, I make lists of
> > species names. Sometimes these lists are in a narrow frame or table
> > column.
- -
> Well, Netscape invented one already, and at least MSIE honors it:
> 
>    some <NOBR>hyphenated-name</NOBR> here

According to http://www.blooberry.com/html/tagpages/n/nobr.htm
the NOBR tag is supported by IE and Netscape from version 1 and
by Opera from version 2.1. And it "downgrades gracefully" in the
sense that it causes no harm on browsers which do not recognize it
but behave according to the recommended manner: ignore unknown tags
(i.e. ignore <NOBR> and </NOBR>, but not what's between them).

But the NOBR element has never been defined rigorously (see the
document mentioned above), which is probably the reason why it
has not been included into any official HTML specification.

For texts within tables in particular, you could use the NOWRAP
attribute for TD and TH elements. That attribute is valid in HTML 3.2
and in HTML 4.0 Transitional. On the other hand, its scope is the
entire table cell.

As regards to the original proposal, please notice that an entity
is effectively just a predefined macro. If an entity for no-break-hyphen
(or, to use the Unicode name, non-breaking hyphen) were introduced,
it would _only_ mean that you could write e.g.
  &nbhy;
instead of
  &#8209;
(8209 is the decimal notation for the code position of non-breaking hyphen
in Unicode, U+2011). It would _not_ add anything to the semantics of
that character in HTML. If browsers would be required to process the
character in some particular way, the requirement would need to be
formulated separately.

I haven't noticed any other browser than IE break words at hyphens.
I'd consider the IE behavior as an irritating anomaly, if not a bug,
rather than as something that should affect the definition of the
HTML language (e.g. by introduction of new entities or tags).
Personally I dislike that behavior very much, since it may even break
expressions like 8859-1.

Hyphenation and word breaks would need to be considered as a separate
problem, covering various aspects and involving the need for language
information (LANG attribute) and language-specific algorithms. In such
a context, it would be appropriate to consider how authors could
distinguish between hyphen, minus, non-breaking hyphen, and other
hyphen-like characters. Whether the distinction between a hyphen and
a non-breaking hyphen is structural or presentational or something between
is an interesting question. The answer to that would also affect the
question whether the distinction should be made at the character level
or at the HTML markup level (using some HTML element) or somewhere else
(basically, in a style sheet).

--
Yucca, http://www.hut.fi/u/jkorpela/ or http://yucca.hut.fi/yucca.html
Received on Tuesday, 22 September 1998 01:58:01 UTC