W3C home > Mailing lists > Public > www-html@w3.org > February 2004

RE: <NOBR> - Returning to the question ( 2 )

From: Ernest Cline <ernestcline@mindspring.com>
Date: Sat, 28 Feb 2004 17:45:21 -0500
Message-ID: <410-220042628224520984@mindspring.com>
To: "Jukka K. Korpela" <jkorpela@cs.tut.fi>, www-html@w3.org


> [Original Message]
> From: Jukka K. Korpela <jkorpela@cs.tut.fi>
>
> Ernest Cline wrote:
>
> > The usual result you describe is due to IE actually following a
standard,
> > the Unicode Line Breaking algorithm, since by that standard &nbsp;
> > should not be considered a character for justification.
>
> I am unable to find such a requirement in Unicode Standard Annex #14,
> "Line Breaking Properties", which is what you probably mean, or in the
> Unicode standard elsewhere. Can you please cite by page or by clause
> what you mean? I would not expect to find such a statement, least of all
> as a requirement, in UAX #14, since line breaking is logically independent
> of justification. As far as I can see, UAX #14 just mentions "Line
> fitting" in "2 Definitions" but does not actually even use this term.

The part that I was referring to is the fourth paragraph of Section 3
of UAX#14: [1]

"When expanding or compressing inter-word space, only the space
marked by U+0020  SPACE and U+3000  IDEOGRAPHIC SPACE are
normally subject to compression, and only spaces marked by U+0020
SPACE, and occasionally spaces marked by U+202F  THIN SPACE
are subject to expansion. All other space characters have fixed width."

I  will agree that it should be better marked out, as this is a ridiculous
place for putting this requirement, and that this is in an non-normative
part of the annex.  However, this also accords rather well with the
CSS 2.1 definition of whitespace (given in section 4, Syntax, but also
referred to by the 'white-space' property) which states: [2]

"Only the characters "space" (U+0020), "tab" (U+0009), "line feed"
(U+000A), "carriage return" (U+000D), and "form feed" (U+000C)
can occur in whitespace. Other space-like characters, such as
"em-space" (U+2003) and "ideographic space" (U+3000), are
never part of whitespace."

CSS3 Text, Section 7.2 [3] is essentially the same, except that
it uses the XML definition of whitespace, which again excludes
"no breaking space" (U+00A0).  The net result is that U+00A0
should not be affected by anything, such as justification, that
affects inter-word spacing.

> It is common to treat no-break space as non-stretchable and
> non-shrinkable, effectively as a fixed-width space, but this is not a
> requirement, or even a recommendation, in the Unicode standard,
> or in any HTML specification, as far as I can see.
>
> > Given that these
> > lists tend more to complain about IE not following an existing standard,
> > I certainly don't want them being blamed for doing so.
>
> Actually, to the extent that IE follows UAX #14, it _is_ to be blamed,
> since UAX #14 is a horrendous piece of standard, _especially_ when
> implemented mechanically. I accuse IE for breaking "-a" into "-" and "a".
> I also accuse UAX #14 for permitting and even encouraging such
> madness. (See http://www.cs.tut.fi/~jkorpela/unicode/linebr.html )

None of the major browsers does a good job with line breaking, at present
As for the example, I will note that if instead of U+002D HYPHEN-MINUS
one uses the unambiguous U+2212 MINUS SIGN, then applying the rules
of UAX#14 would prohibit that break.  Yes, ideally, IE should perform
a contextual analysis to determine whether the hyphen-minus is acting
more like a hyphen (class BA) or a minus (class PR) with the rules
for hyphen-minus (class HY) being used only if the UA can't determine
that.  (Note: Unlike the class GL characters such as WJ whose
line breaking behavior is normative, class HY is informative.) If "-a"
occurred at the start of an element or following a space, I would expect it
to be treated as a minus, which would handle the common case.

> > If you want a space that will justify without the use of CSS
> > or non-standard HTML, then you need to use a space surrounded
> > by a pair of Unicode glue characters,
>
> No, I won't, since I know what will happen.
>
> > Unfortunately, UA support for these is somewhat spotty at present.
>
> That's quite an understatement of the problems.
>
> > IE 6.0 (Windows) breaks with ZWJ and WJ, plus it inserts unfound
> > character glyphs for CGJ and WJ.
>
> This depends on the font I presume. But generally, relying on support to
> fairly little known and poorly supported Unicode characters for such
> simple things as preventing line breaks is disproportionate.

I agree that given current implementations, it is not a good current
solution, but the use of the class GL characters does have a
normative effect for line breaking, and thus rather than adding
<NOBR> to XHTML2, I feel that relying on the text glue is more
appropriate when the non-breaking is not a side-effect of
semantic markup.

> HTML user agents should not apply Unicode line breaking rules until
> they can do it reasonably and until there are effective ways of switching
> them off. But they have started applying random parts of the rules.
> Luckily virtually all of them recognize <nobr> too. There's little
> point in telling authors use some constructs they won't even understand
> (and hence will use wrongly part of the time) and that aren't actually
> supported, when there's a simple clear-cut method of standardizing
> <nobr> and <wbr>.

Well, if there ever is an IE 7, I would expect it to fully support the
normative portions of UAX#14, and if not, then IE 6 will eventually
fade away. Opera supports this portion of the standard, but, it does
suffer from the same bug concerning the glyphs for CGJ and WJ
that IE does.  Given that these two characters were only added in
Unicode 3.2.  I would expect that programs that rely entirely upon
the OS for glyph information will have to wait until new OS versions
that are aware of these characters are released to work correctly.
This is one reason why even when Mozilla correctly supports
the class GL class characters, I will use ZWNBSP instead of WJ
for quite a few years to come.

> > However, there is clearly no need to incorporate <NOBR>
> > into the standard, as its non-presentational aspects can be handled
> > by plain Unicode text without the use of markup.
>
> So assuming that someone has, say, a 42 characters long string without
> spaces, as people fairly often have, and that string should not be broken
> into two lines, then the author is supposed to understand UAX #14
> (I started studying it in 2000 and I still don't quite get it, still
> less speak it fluently, and it's a moving target of course) and to
> insert the preferred invisible joiner character du jour at any place that
> may need it? Well, it's probably much simpler to put it between every two
> characters, isn't it? After all, even if the author got UAX #14 right in
> detail, browsers most probably won't.
>
> Compare the beauty of e.g.
> [&#8288;?&#8288;%&#8288;x&#8288;-&#8288;1&#8288;+&#8288;2&#8288;]
> with the ugly presentational markup
> <nobr>[?%x-1+2]</nobr>

Actually, given a bare-bones dumb implementation of UAX#14,
which is not difficult to implement:
[?&#xfeff;%&x#feff;x-1&x#feff;+2]
would suffice.  Granted, for hand coders, it  is a bit much to expect
that they would know that, but it wouldn't be difficult for a authoring
tool to show where the algorithm would produce line breaks and
introduce the glue where desired.

However, for such a case as you gave, something like:
<code>[?%x-1+2]</code>
with the exact desired presentation being supplied by CSS is likely
to preferable.  Anything that needs that much glue, probably needs
it for a reason attributable to the desired presentation of a semantic
element such as <code>, <a>, or even <span class="someclass">.

The case that the original querent inquired about was for a justifying
non-breaking space.  For this, the simple case of WJ SP WJ is easy
to code.  Granted, until UA support is adequate for Unicode glue
characters, some people will need to use presentational elements
such as the standard <span style="white-space:nowrap"> or the
non-standard <nobr> to adequately handle this edge case, but that
is in my opinion not an argument for adding <nobr> to XHTML 2.

[1] http://www.unicode.org/unicode/reports/tr14/#Introduction
[2] http://www.w3.org/TR/CSS21/syndata.html#whitespace
[3] http://www.w3.org/TR/css3-text/#white-space-props
Received on Saturday, 28 February 2004 17:45:32 UTC

This archive was generated by hypermail 2.3.1 : Wednesday, 5 February 2014 07:19:04 UTC