Re: Follow-up on Balance Text Proposal

On Mon, Oct 13, 2014 at 12:52 AM, Liam R E Quin <liam@w3.org> wrote:
> On Sun, 12 Oct 2014 13:54:20 -0400
> Zack Weinberg <zackw@panix.com> wrote:
>
>> All the optimal-balancing algorithms I know about are resilient
>> in the presence of a "this paragraph is too large, split it
>> arbitrarily into reasonably-sized chunks" heuristic.
>
> The original TeX algorithm had non-polynomial behaviour on the
> number of words in the paragraph, so that would be a "no" :-)

TeX has no such heuristic built in (even now, AFAIK), but the TeXbook
suggests applying one manually if you are faced with copy containing a
run-on paragraph.  Specifically, it provides an incantation that
splits a paragraph in two (thus limiting the amount of material the
linebreaker has to process at once) without producing a visible
paragraph break in the output.  The words on either side of the split
are constrained to have a line break in between them, but it says that
if the original paragraph is large enough that you need this trick,
readers are unlikely to notice, and the quality of the output is
relatively insensitive to the exact position of the split.

As I'm not aware of any linebreaking algorithm that is *pickier* about
its input than K-P, I think my original claim is accurate.

>> (One of these years I'll get around to implementing Knuth-Plass
>> linebreaking in Servo to see how it goes.)
>
> The original Knuth-Plass line-breaking algorithm isn't actually all
> that good for a browser, because the underlying assumption is that
> the author is there to make corrections and guide the algorithm in
> edge cases. Anyone who has used TeX and seen messages like
> "underfull hbox" has seen problems.

Oh sure, with my academic hat on I am all too familiar with that
particular headache.  It's just that I think I can come up with
heuristics that do reasonably well on "typical" copy, and (what
may be more important for a browser) degrade more gracefully
than TeX does in the presence of difficult copy.

> Their paper also shows that their algorithm is NP-complete.

IIRC - and I have read the paper, but it was a while ago - they
presented a *generalization* that's NP-complete, but the algorithm
actually implemented in TeX is "only" quadratic in the number of words
in a paragraph (because it's restricted to a locally-convex region of
the problem space).  And there's a refinement from the late 80s that's
linear but probably has worse constant factors.

> There are modified algorithms that work better. In practice, an
> balancing first-fit algorithm that averages out over a running
> n-line window is almost as good in most cases and doesn't have the
> bad edge-case behaviour - it's also massively faster, in algorithmic
> complexity linear in the number of words. I described it in a
> message to this list a year or two ago.

Do you have a pointer? I don't recall that message.

> However, the "balance text" Adobe suggest does seem better for
> headlines.

In this discussion, I am only concerned with not standardizing
something that requires (even effectively) a specific algorithm
either for 'text-wrap: balance' or 'text-wrap: normal'.

zw

Received on Monday, 13 October 2014 15:36:07 UTC