Re: CSS3 Text: Line-breaking Properties from Jukka K. Korpela on 2003-05-04 (www-style@w3.org from May 2003)

From: Jukka K. Korpela <jkorpela@cs.tut.fi>
Date: Sun, 4 May 2003 09:10:30 +0300 (EEST)
To: www-style@w3.org
Message-ID: <Pine.GSO.4.50.0305040815420.15758-100000@korppi.cs.tut.fi>
On Sat, 3 May 2003, fantasai wrote:

> What I describe as 'strict' could easily be a UA's 'normal' behavior.

I have no idea of what you mean by that. The proposed values "strict" and
"normal" are distinct, and that's an essential distinction. Are you saying
that "normal" could be treated by a UA as the same as "strict"?

> 'normal', however, allows the UA more freedom to define its algorithm,

Any value allows considerable freedom in the actual layout algorithms,
since the value only specifies _permitted_ line breaking points.

> as long as it keeps within the limits set by UAX 14.

This raises a question of quality. If we take the extremistic point that
the line breaking properties really specify line breaking opportunities
only, then a browser that presents any paragraph as a single line would
conform. On the other hand one might say that any line that exceeds the
available width _must_ be broken by the UA if there is a line breaking
opportunity inside it - though the UA could still make its own decision on
_where_ to break it. I'll skip fine tuning here; advanced layout
algorithms may accept lines that exceed the overall line length limit
to a small amount, if this considerably improves the situation on other
lines. (It _is_ fine tuning and cannot refute my argument.)

This implies that if a value is defined by a reference to UAX 14, then
implementations _must_ use _any_ UAX 14 line breaking opportunity at least
in the situation where that is the only way to deal with a particular line
that would otherwise exceed a limit.

> As a simplistic example, let as define an algorithm which only allows
> breaks at spaces and after hyphens.

I don't see the relevance of describing a particular algorithm here.

> You can, of course, extrapolate this to great complexity, and it will still
> satisfy the requirements of 'normal' line breaking.

Maybe under _some_ definition of "normal". Note that this would mean that
the algorithm would not split a 2000 characters long URL if it does not
contain a hyphen. That is, it would not e.g. apply the UAX 14 rule that
permits a split after a solidus (slash, "/").

Actually, unless I have missed something in UAX 14, they (and thus your
proposed "normal") rules do _not_ always permit a line break at
(or, technically, after) a space character. For example,
"it's in directory /usr/spool"
must not, under UAX 14, be split as

it's in directory
/usr/spool

since a break before "/" is never permitted, but the following split would
allowed:

it's in directory /
usr/spool

Thus your example algorithm would presumably not qualify as UAX 14
conformant.

> However, it can't be
> used for 'strict' line breaking because it allows breaking after a hyphen,
> which 'strict' does not.

Whether a hyphen (or hyphen-minus) is a permissible line breaking point
is an important decision. It was a bad move to make some UAs treat it
permissible by default, and I don't think we should take great pains to
retrofit things into such misbehavior. But it's probably a common enough
need to have a value for in the relevant CSS property. And for obvious
reasons, using such a value, i.e. asking a UA to break after a hyphen
when needed, should not open the Pandora's box of UAX 14 rules.

On the other hand, I presume that it would be relatively simple,
both in terms of specifications and in actual implementations, to
allow values that involve lists (sets) of characters, enumerating
the characters after which a line break is permitted. In typical
cases, when you have a long string that contains special characters,
there is a fairly limited set of characters that are really suitable
break points.

Such a value should probably involve the principle that no word
(a string of non-whitespace characters separated by whitespace
characters) shall be split so that only one or two characters are
left at the end of a line or at the start of a line. This would
disallow e.g. breaking "-a" into "-" and "a". It's not just a
quality of implementation issue. Who would dare to invoke a line
breaking method if it is allowed to result in such splits?

> Do you still disagree that 'normal' should be the default?

Of course. I haven't seen a single argument _in favor of_ making it the
default. Far from being anything intuitively normal, your proposed
"normal" builds upon a complicated and artificial set of rules,
in a rather obscure way that does not even say whether the rules
are to be applied or not in practice. Even more importantly, it
would seriously deviate from established practice and both authors' and
users' expectations and would even distort information.

People who have authored documents have had no reason to expect that UAs
would some day start treating a string like "myfiles/foo.txt" as something
that can be split to "myfiles/" and "foo.txt" (on different lines) by a
UA. I would appreciate the possibility of having, say, long URLs split
according to some rules, in case I need to enter URLs into my documents
as text (e.g., when discussing URLs in the document document). But I would
prefer knowing the rules then, and having them a few magnitudes simpler
than UAX 14, and having them enabled on my command only. If I know that
line breaks may occur, I can take the suitable precautions like using "<"
and ">" delimiters, or something. The bulk of existing Web pages have
_not_ been authored with such precautions. And it would not be adequate
if authors need to consider all text as potential victims of "normal" line
breaking.

-- 
Jukka "Yucca" Korpela, http://www.cs.tut.fi/~jkorpela/
Received on Sunday, 4 May 2003 07:35:15 UTC