- From: Zack Weinberg <zackw@panix.com>
- Date: Mon, 13 Oct 2014 11:35:44 -0400
- To: Liam R E Quin <liam@w3.org>, www-style list <www-style@w3.org>
On Mon, Oct 13, 2014 at 12:52 AM, Liam R E Quin <liam@w3.org> wrote: > On Sun, 12 Oct 2014 13:54:20 -0400 > Zack Weinberg <zackw@panix.com> wrote: > >> All the optimal-balancing algorithms I know about are resilient >> in the presence of a "this paragraph is too large, split it >> arbitrarily into reasonably-sized chunks" heuristic. > > The original TeX algorithm had non-polynomial behaviour on the > number of words in the paragraph, so that would be a "no" :-) TeX has no such heuristic built in (even now, AFAIK), but the TeXbook suggests applying one manually if you are faced with copy containing a run-on paragraph. Specifically, it provides an incantation that splits a paragraph in two (thus limiting the amount of material the linebreaker has to process at once) without producing a visible paragraph break in the output. The words on either side of the split are constrained to have a line break in between them, but it says that if the original paragraph is large enough that you need this trick, readers are unlikely to notice, and the quality of the output is relatively insensitive to the exact position of the split. As I'm not aware of any linebreaking algorithm that is *pickier* about its input than K-P, I think my original claim is accurate. >> (One of these years I'll get around to implementing Knuth-Plass >> linebreaking in Servo to see how it goes.) > > The original Knuth-Plass line-breaking algorithm isn't actually all > that good for a browser, because the underlying assumption is that > the author is there to make corrections and guide the algorithm in > edge cases. Anyone who has used TeX and seen messages like > "underfull hbox" has seen problems. Oh sure, with my academic hat on I am all too familiar with that particular headache. It's just that I think I can come up with heuristics that do reasonably well on "typical" copy, and (what may be more important for a browser) degrade more gracefully than TeX does in the presence of difficult copy. > Their paper also shows that their algorithm is NP-complete. IIRC - and I have read the paper, but it was a while ago - they presented a *generalization* that's NP-complete, but the algorithm actually implemented in TeX is "only" quadratic in the number of words in a paragraph (because it's restricted to a locally-convex region of the problem space). And there's a refinement from the late 80s that's linear but probably has worse constant factors. > There are modified algorithms that work better. In practice, an > balancing first-fit algorithm that averages out over a running > n-line window is almost as good in most cases and doesn't have the > bad edge-case behaviour - it's also massively faster, in algorithmic > complexity linear in the number of words. I described it in a > message to this list a year or two ago. Do you have a pointer? I don't recall that message. > However, the "balance text" Adobe suggest does seem better for > headlines. In this discussion, I am only concerned with not standardizing something that requires (even effectively) a specific algorithm either for 'text-wrap: balance' or 'text-wrap: normal'. zw
Received on Monday, 13 October 2014 15:36:07 UTC