Re: <NOBR> - Returning to the question.... from Ernest Cline on 2004-03-31 (www-html@w3.org from March 2004)

From: Ernest Cline <ernestcline@mindspring.com>
Date: Wed, 31 Mar 2004 14:34:18 -0500
To: "Jukka K. Korpela" <jkorpela@cs.tut.fi>, www-html@w3.org
Message-ID: <410-220043331193418562@mindspring.com>
> [Original Message]
> From: Jukka K. Korpela <jkorpela@cs.tut.fi>
> To: <www-html@w3.org>
> Date: 3/31/2004 3:52:30 PM
> Subject: Re: <NOBR> - Returning to the question....
>
>
> On Tue, 30 Mar 2004, Ernest Cline wrote:
>
> > Explicitly using HYPHEN, NON-BREAKING HYPHEN, and MINUS
> > handles the HYPHEN-MINUS ambiguity when you know how you
> > want it the line to break around it.
>
> In practical terms, they seriously limit the number of browsing situations
> where the user sees the correct characters at all - or hears them. Not to
> mention all kinds of other software than browsers. I recently learned that
> Google translator cannot even handle a right single quote correctly,
> and it is a much better supported character than e.g. the minus character.

Then that is the fault of the implementation,  and considering that all of
the
characters I've mentioned have been in Unicode since at least version 1.1
there really is no excuse for anything other than antiquated software to not
display them correctly as both HYPHEN and NON-BREAKING HYPHEN
are both in General Punctuation Block.  MINUS  generally has good font
support, even tho it is not in the General Punctuation block, mainly because
some fonts make a visual distinction between it and HYPHEN.  Your point
about search software is more germane, except that how often is the search
target going to include a hyphen type character where the default breaking
behavior of HYPHEN-MINUS won't be acceptable? It's an edge case
at best.

> In principle, on the other hand, using non-breaking variants of characters
> and special characters for mere line breaking control looks primitive if
> you compare it with the simple idea of markup. Unicode really tries too
> much in this area. (It's comparable to language tag characters.)

One might argue that they should have limited themselves to just the
normative portions of UAX#14, but in no way is this comparable
to the language tag characters.  Tag characters are stateful,
which is something that should generally be avoided in plain text,
while the characters that affect line-breaking are not.

> > For most other isolated cases,
> > &#8204; and &#8205; (ZWNJ and ZWJ) are sufficient.
>
> Sufficient for creating great confusion, for sure.
>
> I don't see why authors should use poorly supported tricky
> characters instead of simple markup that has worked for years,

Poorly supported?  Perhaps by poor implementations, but again
these two characters have been around since at least Unicode 1.1.
Any implementation of Unicode should support ZWNJ and ZWJ.
Now that I've had time to reflect on this, ZWSP and ZWNBSP are
really the preferred characters to do this as they affect only
line-breaking and nothing else.

> > You'll have
> > to come up with a better example to convince me that <nobr> is
> > needed for semantic reasons.
>
> The practical reasons alone should be overwhelming. And in principle,
> handling line breaking - especially to prevent line breaks inside a
> string, as opposite to preventing it in a particular location between two
> characters - belongs much better to markup level than to character level.

What practical reasons? Large areas of non-breaking behavior are
stylistic and should be handled as such.  Isolated incidents of overriding
the default behavior for semantic reasons can be handled via the
ZWSP, ZWNJ, ZWJ, and ZWNBSP characters from Unicode (with WJ
replacing ZWNBSP once implementations handle that recently added
character correctly.)  These four characters have been around since
the start of Unicode and should be handled correctly as far as line
breaking is concerned by any Unicode implementation. The difference
between ZWSP and ZWNJ and between ZWNBSP and ZWJ is with how
they affect the shaping of adjacent characters, which for many scripts is
not noticeable.
Received on Wednesday, 31 March 2004 14:36:45 UTC