RE: <NOBR> - Returning to the question ( 2 )

On Fri, 27 Feb 2004, Ernest Cline wrote:

> The usual result you describe is due to IE actually following a standard,
> the Unicode Line Breaking algorithm, since by that standard &nbsp;
> should not be considered a character for justification.

I am unable to find such a requirement in Unicode Standard Annex #14,
"Line Breaking Properties", which is what you probably mean, or in the
Unicode standard elsewhere. Can you please cite by page or by clause
what you mean? I would not expect to find such a statement, least of all
as a requirement, in UAX #14, since line breaking is logically independent
of justification. As far as I can see, UAX #14 just mentions "Line
fitting" in "2 Definitions" but does not actually even use this term.

It is common to treat no-break space as non-stretchable and
non-shrinkable, effectively as a fixed-width space, but this is not a
requirement, or even a recommendation, in the Unicode standard, or in any
HTML specification, as far as I can see.

> Given that these
> lists tend more to complain about IE not following an existing standard,
> I certainly don't want them being blamed for doing so.

Actually, to the extent that IE follows UAX #14, it _is_ to be blamed,
since UAX #14 is a horrendous piece of standard, _especially_ when
implemented mechanically. I accuse IE for breaking "-a" into "-" and "a".
I also accuse UAX #14 for permitting and even encourageing such madness.
(See http://www.cs.tut.fi/~jkorpela/unicode/linebr.html )

> If you want a space that will justify without the use of CSS
> or non-standard HTML, then you need to use a space surrounded
> by a pair of Unicode glue characters,

No, I won't, since I know what will happen.

> Unfortunately, UA support for these is somewhat spotty at present.

That's quite an understatement of the problems.

> IE 6.0 (Windows) breaks with ZWJ and WJ, plus it inserts unfound
> character glyphs for CGJ and WJ.

This depends on the font I presume. But generally, relying on support to
fairly little known and poorly supported Unicode characters for such
simple things as preventing line breaks is disproportionate.

HTML user agents should not apply Unicode line breaking rules until
they can do it reasonably and until there are effective ways of switching
them off. But they have started applying random parts of the rules.
Luckily virtually all of them recognize <nobr> too. There's little
point in telling authors use some constructs they won't even understand
(and hence will use wrongly part of the time) and that aren't actually
supported, when there's a simple clear-cut method of standardizing <nobr>
and <wbr>.

> However, there is clearly no need to incorporate <NOBR>
> into the standard, as its non-presentational aspects can be handled
> by plain Unicode text without the use of markup.

So assuming that someone has, say, a 42 characters long string without
spaces, as people fairly often have, and that string should not be broken
into two lines, then the author is supposed to understand UAX #14
(I started studying it in 2000 and I still don't quite get it, still
less speak it fluently, and it's a moving target of course) and to
insert the preferred invisible joiner character du jour at any place that
may need it? Well, it's probably much simpler to put it between every two
characters, isn't it? After all, even if the author got UAX #14 right in
detail, browsers most probably won't.

Compare the beauty of e.g.
[&#8288;?&#8288;%&#8288;x&#8288;-&#8288;1&#8288;+&#8288;2&#8288;]
with the ugly presentational markup
<nobr>[?x-1+2]</nobr>

-- 
Jukka "Yucca" Korpela, http://www.cs.tut.fi/~jkorpela/

Received on Saturday, 28 February 2004 00:48:42 UTC