Re: CSS3 Text: Line-breaking Properties

Jukka K. Korpela wrote:
 > On Sat, 3 May 2003, fantasai wrote:
 >
 >>What I describe as 'strict' could easily be a UA's 'normal' behavior.
 >
 > I have no idea of what you mean by that. The proposed values "strict" and
 > "normal" are distinct, and that's an essential distinction. Are you saying
 > that "normal" could be treated by a UA as the same as "strict"?

That's what I said, but I guess it's not exactly true: if 'strict' is
defined to allow line breaks at -all- white space and 'normal' only allows
line breaks where UAX 14 allows them, then 'strict' text can break at
some white-space points where 'normal' can't.

 >>'normal', however, allows the UA more freedom to define its algorithm,
 >
 > Any value allows considerable freedom in the actual layout algorithms,
 > since the value only specifies _permitted_ line breaking points.

'normal' allows break points prohibited by 'strict', so the UA may
consider more line breaking opportunities as valid than it can in
'strict'. (But it doesn't have to.)

 >>as long as it keeps within the limits set by UAX 14.
 >
 > This raises a question of quality. If we take the extremistic point that
 > the line breaking properties really specify line breaking opportunities
 > only, then a browser that presents any paragraph as a single line would
 > conform.

Yes. But, you see, that's unlikely to happen because people prefer to
have their paragraphs wrapped.

 > On the other hand one might say that any line that exceeds the
 > available width _must_ be broken by the UA if there is a line breaking
 > opportunity inside it - though the UA could still make its own decision on
 > _where_ to break it. I'll skip fine tuning here...
 >
 > This implies that if a value is defined by a reference to UAX 14, then
 > implementations _must_ use _any_ UAX 14 line breaking opportunity at least
 > in the situation where that is the only way to deal with a particular line
 > that would otherwise exceed a limit.

Since I don't accept this argument's base assumption, I don't
agree with your conclusion.

 >>As a simplistic example, let as define an algorithm which only allows
 >>breaks at spaces and after hyphens.
 >
 > I don't see the relevance of describing a particular algorithm here.

It is to demonstrate what would be an algorithm that fits the requirements
for 'normal' but not for 'strict', that does not break at all the
points permitted by UAX 14, and that prioritizes certain types of breaks
over others.

 >>You can, of course, extrapolate this to great complexity, and it will still
 >>satisfy the requirements of 'normal' line breaking.
 >
 > Maybe under _some_ definition of "normal". Note that this would mean that
 > the algorithm would not split a 2000 characters long URL if it does not
 > contain a hyphen.

No, because it's a simplistic example designed for discussion purposes,
not an implementation spec.

 > Actually, unless I have missed something in UAX 14, they (and thus your
 > proposed "normal") rules do _not_ always permit a line break at (or,
 > technically, after) a space character. For example,
 > "it's in directory /usr/spool" must not, under UAX 14, be split as
 >   it's in directory
 >   /usr/spool
 > since a break before "/" is never permitted, but the following split
 > would allowed:
 >   it's in directory /
 >   usr/spool
 >
 > Thus your example algorithm would presumably not qualify as UAX 14
 > conformant.

Yes, you're right. It would need a few more rules to satisfy UAX 14.

BTW, have you emailed the UAX 14 author with your criticism? It seems
to me that situations like /usr/spool and !done weren't considered.
The rule about not breaking before a slash makes sense if there's a
space after the slash as well, but not if there's a space before and
content after. I mean, something like "this / that" should break after
the slash and not before, but "this /that" should break at the space.

 >>However, it can't be used for 'strict' line breaking because it
 >>allows breaking after a hyphen, which 'strict' does not.
 >
 > Whether a hyphen (or hyphen-minus) is a permissible line breaking point
 > is an important decision. It was a bad move to make some UAs treat it
 > permissible by default, and I don't think we should take great pains to
 > retrofit things into such misbehavior.

I don't think it's a bad move to make the hyphen a permissible
breaking point. It's just a bad move to make it permissible without
making it a low-priority breaking point.

 > But it's probably a common enough need to have a value for in the
 > relevant CSS property. And for obvious reasons, using such a value,
 > i.e. asking a UA to break after a hyphen when needed, should not
 > open the Pandora's box of UAX 14 rules.

Allowing a line break only at space, shy, zwsp, and maybe hyphen is
too limiting for general text IMO.

For example, although I'd give preference to the spaces before and
after, I would want it to be permissible to break Chinese/Japanese/
Korean/Vietnamese after a slash. (Like that.) (I still wouldn't want
a url to break at a slash unless it cannot fit on one line. However,
I'd be marking up the URL, so I can apply different line breaking
settings to it--and change the font, too, if I want.)

Other places I'd allow breaking include after em-dashes, between
closing and opening parentheses, after a semicolon, etc.

One could, of course, leave out any line-breaking references and
define a value to be completely UA-dependent. I have no problem
with that. How would you define "normal", given that it needs to
be more flexible than "strict"?

 > On the other hand, I presume that it would be relatively simple,
 > both in terms of specifications and in actual implementations, to
 > allow values that involve lists (sets) of characters, enumerating
 > the characters after which a line break is permitted. In typical
 > cases, when you have a long string that contains special characters,
 > there is a fairly limited set of characters that are really suitable
 > break points.

This method cannot handle things like
   word -- nextword
   Exclamation !
which should not break before the punctuation.

It doesn't allow tailoring line breaking rules to the language if
the rules require something beyond "break after this character".

It also does not provide for prioritization similar to what I did
with spaces vs. hyphens in my sample algorithm.

That aside, it's very awkward to require a style sheet writer
to enumerate all valid line breaking opportunities. CSS is
complex enough -- one should be able to get reasonable line
breaking behavior without designing the algorithm.

~fantasai

Received on Sunday, 4 May 2003 14:51:48 UTC