Re: <NOBR> - Returning to the question.... from Jukka K. Korpela on 2004-03-30 (www-html@w3.org from March 2004)

From: Jukka K. Korpela <jkorpela@cs.tut.fi>
Date: Tue, 30 Mar 2004 20:53:10 +0300 (EEST)
To: www-html@w3.org
Message-ID: <Pine.GSO.4.58.0403302031150.1112@korppi.cs.tut.fi>
On Tue, 30 Mar 2004 olafBuddenhagen@web.de wrote:

> On Mon, Feb 09, 2004 at 10:46:45PM +0200, Jukka K. Korpela wrote:
>
> > However, regarding HTML, the question arises whether <nobr> should be
> > regarded as structural, at least when used for expressions like %7E,
> > which may _change meaning_ when broken into % and 7E.
>
> This should be handled by <code>,

I thought we went through this when the discussion was active. There is
nothing in the definition of <code> that relates to line breaking. Some
computer code systems (or "languages") might have their own rules, but
there isn't even in principle a way to indicate the code system used
inside <code>.

> not by a <nobr>, which is purely presentational.

This was discussed too. Is &nbsp; purely presentational? If it is,
shouldn't it be deprecated in favor of CSS?

> In semantical markup, the question you need to ask
> yourself is always: *Why* do I not want this to be broken? Because it is
> code!

No, it's because its meaning changes if a line break is inserted. Whether
it is code or not (whatever that means in detail) is orthogonal to this.

> Generally, WHY is the key to semantical markup...
>
> > Or for expressions like -1.
>
> If Unicode linebreaking rules are any good (I do not know them),

Sorry, but if you don't know them, you don't even understand the problem.
Still less can you evaluate the suggested solutions.

> the
> problem is actually a different one: Nobody but professional typesetters
> do know and respect the five or so different types of dash-like
> characters, all fulfilling a different purpose, and all having a
> different character code in Unicode (I guess).

That's a problem (to the extent that it is true), and it indeed is a
completely separate problem.

> However, once you actually start to consider the fact that -1 shouldn't
> be broken, you'll probably also consider the fact that minus is
> something different than a dash or a hyphen...

What makes you think that in "-1", the "-" is inevitably just a surrogate
for minus? Besides, the Unicode standard actually defines "-" as
hyphen-minus, as a character with dual (or actually multiple) usage.
Yet the Unicode line breaking rules play their own game, forgetting
that duality.

> Anyways, I do not see any good solution for this. We probably can't
> teach every web author to use Unicode correctly,

Well, HTML is already based on Unicode as regards to characters.
But it need not adopt all the strange definitions like line breaking rules
which mostly just break things.

And using <nobr> is a very simple solution, already implemented. It does
not prevent the elaboration of more sophisticated methods, if desired.
Reluctance to make <nobr> part of the specification conveys a message:
authors are not expected to defeat clueless line breaking algorithms
applied by browsers, except perhaps in a clumsy way by making optional
presentational suggestions in CSS, which usually means adding extra
<span> markup. (Effectively, <span> with style="..." or class="..."
mostly indicates either lack of suitable markup, or lack of attempts to
find suitable markup. Which one would
<span style="white-space:nowrap">-a</span> be?

> but we can't ignore
> Unicode either if we want to have any reasonable language handling...

Huh? Language handling surely depends on quite different issues. Mostly,
about building actual support to languages into browsers.

> Overriding Unicode rules won't do.

If Unicode line breaking rules are regarded as something that should not
be overridden, the results will be grotesque. They are already bad as what
they were probably meant to be, a simple general basis upon which you
could build your own linebreaking rules, if you find the basis suitable.

> > if a document e.g. discusses the command "rm -r /usr/spool/foo",
>
> <code> again.

And how do you expect or want that to affect line breaking?

-- 
Jukka "Yucca" Korpela, http://www.cs.tut.fi/~jkorpela/
Received on Tuesday, 30 March 2004 12:53:23 UTC