Re: <NOBR> - Returning to the question.... from olafBuddenhagen@web.de on 2004-03-31 (www-html@w3.org from March 2004)

From: <olafBuddenhagen@web.de>
Date: Wed, 31 Mar 2004 12:12:50 +0200
To: www-html@w3.org
Message-ID: <20040331101250.GA4637@sky.local>
Hi,

On Tue, Mar 30, 2004 at 08:53:10PM +0300, Jukka K. Korpela wrote:

> > > However, regarding HTML, the question arises whether <nobr> should
> > > be regarded as structural, at least when used for expressions like
> > > %7E, which may _change meaning_ when broken into % and 7E.
> >
> > This should be handled by <code>,
> 
> I thought we went through this when the discussion was active.

Must have missed it...

> There is nothing in the definition of <code> that relates to line
> breaking. Some computer code systems (or "languages") might have their
> own rules, but there isn't even in principle a way to indicate the
> code system used inside <code>.

I fail to come up with an example for any kind of "code" where PRE or
NOBR as a default rendering doesn't make sense. (I'm not sure which one
of those two is better, though.) Please help me out here if you have
some.

> > not by a <nobr>, which is purely presentational.
> 
> This was discussed too. Is &nbsp; purely presentational? If it is,
> shouldn't it be deprecated in favor of CSS?

Indeed, it is presentational in some sense. In a very special sense,
however.

To begin with, &nbsp; isn't really HTML. It is a Unicode character. Why?
Because it's neither semantical (adding additional meaning to the
content), nor a matter of variable styling (pasted over the content by
some layouting tool or whatever). It is something sticking firmly and
invariably to the content itself, by means of convention. Not because it
really has a meaning, nor because the individual author liked it that
way; but because it would look odd and be harder to read otherwise.

> > In semantical markup, the question you need to ask yourself is
> > always: *Why* do I not want this to be broken? Because it is code!
> 
> No, it's because its meaning changes if a line break is inserted.
> Whether it is code or not (whatever that means in detail) is
> orthogonal to this.

Wrong.

But leads us to an important source of misunderstanding: What is a
useful definition of "code"?

Allow me to approach this backwards: Why does the meaning change? How
can it be that the meaning of a piece of text changes when an
inappropriate line break is inserted? Isn't text (in our alphabetical
languages I mean) just a collection of words (and occasional numbers),
regardless whether side by side or one under the other, as long as the
order is clear? - Well, but... there are special cases... - What special
cases? - When it isn't normal (natural language) text, but a
constellation of characters with a special symbolic meaning... Peng!
That's it. Normal line breaking rules do not apply when we have... CODE!

Isn't this beautiful? Such a simple, obvious definition. Code is any
character string that is not natural language text. That's it.

But the best is still to come: With this nice new definition of code,
we've just solved the eternal Programming Collection dilemma. Yes,
really. See: Now that <code> is so generic and essential, it goes into
the core language. The other elements are simply... dropped. They are
now all just special kinds of code -- no place for such specific
elements in a minimal, general purpose language like XHTML. If someone
wants that fine a distinction, go for a specialized language.

Did I already mention that I love generalizations? :-)

> > However, once you actually start to consider the fact that -1
> > shouldn't be broken, you'll probably also consider the fact that
> > minus is something different than a dash or a hyphen...
> 
> What makes you think that in "-1", the "-" is inevitably just a
> surrogate for minus?

Because I can't think of any other case. Need your help again.

> Besides, the Unicode standard actually defines "-" as hyphen-minus, as
> a character with dual (or actually multiple) usage. Yet the Unicode
> line breaking rules play their own game, forgetting that duality.

The "normal" '-' is inherited from ASCII for compatibility. Don't use it
if you need better control. Use the more precise characters Unicode
offers.

> (Effectively, <span> with style="..." or class="..." mostly indicates
> either lack of suitable markup, or lack of attempts to find suitable
> markup. Which one would <span style="white-space:nowrap">-a</span> be?

Can't tell without context. Most likely <code> again.

On Tue, Mar 30, 2004 at 11:12:41PM +0300, Jukka K. Korpela wrote:

> > The hyphen-minus character gets its own character class (HY), and
> > breaks between HY followed by NU (numeric character class) are
> > forbidden.  See rule LB18 in [2].
> 
> That's simply insufficient, since the hyphen-minus may well act as a
> minus sign in an algebraic expression like "-a".

With our shiny new definition of "code", mathematical formulas fit in
here very nice as well. Or better even, use MathML.

Anyways, nobody forces you to use hyphen-minus. I'll even venture to
claim that discussing the qualitiy of the hyphen-minus fallback kludge
is kind of off-topic here.

> Moreover, it seems that the programmers of IE did not get the finer
> points - IE happily breaks between a hyphen-minus and a digit. I don't
> blame them too much, except for _attempting_ to implement something
> that shouldn't have been implemented at all, and surely not in a
> clueless manner that breaks two-character strings too, or in a manner
> that gets the rules wrong.

That's a failure of MS, not of Unicode or HTML.

> (Permitted line breaks in non-Latin writing systems are a different
> issue and deserve due consideration on a basis of observing the
> specifics of those writing systems, rather than Unicode linebreaking
> confusion.)

Actually, my impression is that setting rules that work for *all*
languages, is the whole point of Unicode. If it fails in this regard,
then it's really useless.

But it's not the task of W3C to set this straight. Go for the Unicode
standard instead, if you really see room for improvement here.

> So if I write "the normal plural suffix in English is '-s'", is it
> quite OK for a browser to insert a line break after the hyphen-minus?
> And am I supposed to introduce artificial <span> markup and some CSS
> code just to prevent that? Isn't enough to use <nobr>? It's not some
> funny span element I have in mind with some optional suggestion; I
> know pretty well that I just want to prevent the wrong line break.
> (And using the hyphen character, U+2010, would not help here. The
> Unicode linebreaking rules permit a line break after it.)

Unicode has its own measures to explicitely specify line breaking rules.
No need to do workarounds in HTML. This is really out of the scope of a
markup language.

And it's probably not even necessary. What you have here is still
another type of dash -- I'm pretty sure Unicode has a special character
(with correct implicit linebreaking rules) for this as well.

-antrik-
Received on Wednesday, 31 March 2004 06:44:30 UTC