Re: <NOBR> - Returning to the question.... from Jukka K. Korpela on 2004-04-01 (www-html@w3.org from April 2004)

From: Jukka K. Korpela <jkorpela@cs.tut.fi>
Date: Thu, 1 Apr 2004 11:08:08 +0300 (EEST)
To: www-html@w3.org
Message-ID: <Pine.GSO.4.58.0404011042590.25857@korppi.cs.tut.fi>
On Wed, 31 Mar 2004, Ernest Cline wrote:

> > I don't see why authors should use poorly supported tricky
> > characters instead of simple markup that has worked for years,
>
> Poorly supported?

Indeed. I call a character poorly supported, if the vast majority of users
use browsers and settings that make a rectangle or a question mark appear
in place of a character that should be an invisible joiner.

> Perhaps by poor implementations, but again
> these two characters have been around since at least Unicode 1.1.

So? How long has the soft hyphen been in character standards, including
the ISO 8859 set and Unicode? Just putting something into a standard does
not magically turn it into a reality that specifications in other
areas could rely on.

> Any implementation of Unicode should support ZWNJ and ZWJ.
> Now that I've had time to reflect on this, ZWSP and ZWNBSP are
> really the preferred characters to do this as they affect only
> line-breaking and nothing else.

Quite right. So even you had to take some time to find out the really
preferred characters. How about the vast majority of authors who have
little or no idea of any character standards or any characters outside the
normally used characters in the language(s) they ordinarily use?
There's a huge difference between using (correctly) the invisible Unicode
characters and just saying what you really mean, <nobr>...</nobr>.

> What practical reasons?

The practical reasons that <nobr> works almost always and causes no
problems when it doesn't whereas the Unicode special characters, in
addition to being far more complex to understand to authors,
mostly do not work and usually break miserably when they don't.

> Large areas of non-breaking behavior are
> stylistic and should be handled as such.

So you mean there's some virtue in using <span style=
"white-space: nowrap">-1</span> instead of <nobr>-1</nobr>?
I won't go into the details of white-space, which has always been poorly
defined in CSS and still is (how does white-space play when there is no
white space?). The main question is what the more complicated markup
is supposed to benefit. It surely isn't more semantic; at the markup
level, it only says "here is some inline content to which some style is
attached". It by definition works less often, since CSS can be turned off.

> Isolated incidents of overriding
> the default behavior for semantic reasons can be handled via the
> ZWSP, ZWNJ, ZWJ, and ZWNBSP characters from Unicode

That might be what a theory says. But it's much more complex and does not
work in practice, and won't work for many years widely enough to justify
their use on normal Web pages. By the way, what about non-isolated
incidents, like text containing lots of strings with all kinds of
characters that permit line break before or after by the Unicode rules,
like a-15%6/h\z?a (e.g., examples of passwords), but should not be broken
into two lines? Should the author study, for each character, the Unicode
rules, and maybe browsers' (mis)behavior too, to decide which characters
need some linebreak prevention character before or after, or should be
just put them between any two characters?

-- 
Jukka "Yucca" Korpela, http://www.cs.tut.fi/~jkorpela/
Received on Thursday, 1 April 2004 03:08:22 UTC