Re: [CSS21] Concern about anonymous table objects and whitespace

On Thursday 22 January 2009 19:48, Boris Zbarsky wrote:
> Bert Bos wrote:
> > On looking at it it again, I think we made the wrong decision by
> > adding this rule. We actually don't need it. If 'white-space' is
> > 'pre', you don't wan't the space to disappear; and if space is
> > being collapsed, you still need to keep the spaces between words,
> > as the example below shows.

9.2.2.1 says that white space in the source doesn't generate anonymous 
inline boxes if those spaces would be collapsed away. That avoids that

    <div>
      <p>First para...

generates two spaces before the first para. It also avoids that

    <table>
      <tr>...

creates an anonymous row with two spaces before the first row. But, 
strictly speaking, it doesn't avoid that

    <tr>
      <td>First cell...

creates an anonymous cell before the first one, because 9.2.2.1 applies 
to *block-level* elements, of which TABLE is one, but TR is not.

So I retract that we don't need a rule. We do need to say something 
similar to 9.2.2.1 for collapseable spaces inside a table-row(-group) 
or inline-table. Note also that 9.2.2.1 doesn't say that the spaces 
create a box that has 'display: none', it says that there is no box at 
all.

We need to say something somewhere that causes the spaces in

    <p style="display: table-row">
      <em style="display: table-cell">One</em>
      <em style="display: table-cell">Two</em>

to be ignored, but *not* the spaces in

    <p style="display: table-row">
      <em style="display: inline">Three</em>
      <em style="display: inline">four</em>

So, borrowing text from 9.2.2.1, I suggest we add the following at the 
end of the last rule in 17.2.1 ("If a child T of a 'table-row' box P is 
not a 'table-cell' box,[...]"):

    No such table-cell box is generated if it would contain only
    white-space that would subsequently be collapsed away according to
    the 'white-space' property.

And then remove rule 4½. I think that covers all cases. E.g.,

    <div style="display: inline-table">
      <p style="display: table-cell">Top left</p>
      <p style="display: table-cell">Top right</p>
      <p style="display: table-row">
       <i>italic</i> <b>bold</b> etc.
      </p>
    </div>

ignores the spaces before the <p> tags, but keeps the spaces before the 
<b>. (And those before the <i> as well, although they are not rendered, 
because they end up at the start of a line.)

>
> OK, but then what happens with this HTML markup?
>
>    <pre>
>      <table>
>        <tr>
>          <td>Some text</td>
>        </tr>
>      </table>
>    </pre>
>
> Without this rule, the required rendering would be equivalent to this
> markup:
>
>    <pre>
>      <table><tbody><tr><td>
>        </td></tr><tr><td>
>          </td><td>Some text</td><td>
>        </td></tr><tr><td>
>      </tr></tr></tbody></table>
>    </pre>
>
> which is NOT what any UA actually does, nor what any UA wants to do.

It's not valid HTML, so what UAs "should" do is not defined. (Depending 
on circumstances, they could, e.g., close the PRE before the TABLE, do 
as if the TABLE, TR and TD tags weren't there, or show a box that 
says "error!" My own software does the first. The validator suggests 
generating a BUTTON element around the TABLE. Tidy removes the <pre> 
tag and reinserts it after the table. It's hard to tell what browsers 
do from looking at the rendering.)

(Also the uses cases for applying 'white-space: pre' to a table seem 
rather contrived. If something reasonable falls out of the definition, 
that would be nice, but if not, I don't think that's a problem, as long 
as the case with 'white-space: normal' works as expected.)

But let's look at this modified example:

    <pre>
      <span style="display: table">
        <span style="display: table-row">
          <span style="display: table-cell">Some text</span>
        </span>
      </span>
    </pre>

And let's also assume this is SGML (HTML), not XML. (That means, 
concretely, that the ends of line on the 1st, 2nd, 3rd and 6th lines 
are ignored.)

The PRE creates a block box and the first content of that is two spaces.

Next comes a table box, which is block level, so the previous two spaces 
are, conceptually, wrapped in an anonymous block box.

Inside the table we first find four spaces. They constitute text and 
thus become an anonymous inline box. But we expected a table row box, 
so we open an anonymous one. That still doesn't allow placing the 
anonymous inline, so we open an anonymous table cell as well. Now we 
can place the anonymous inline with its four spaces.

Then we see a table row, so we close the anonymous table cell and the 
anonymous table row that we just created. We now start the second row 
of the table.

We see six spaces, which create another anonymous inline box. (Note that 
I ignore "rule 4½" here.) We are in a table row, so we need to create 
an anonymous table cell to contain them.

Next we see a table cell. So we close the anonymous table cell that we 
just created for the six spaces. We stay in the same row and open the 
table cell.

The element contains "some text", which becomes the content of this 
table cell. And then the table cell ends.

Now we see a newline and four spaces. They create a line break and an 
anonymous inline box. We are still in a table row and thus we need to 
create another anonymous table cell to contain them. That table cell 
thus has two lines of text: an empty one and one with four spaces.

The end of the SPAN signals the end of the table row, i.e., the end of 
the second row of the table.

After that </SPAN> is another newline and two spaces. But we are 
currently in a table element, so we need to create an anonymous table 
row again (the third row of the table) and an anonymous table cell. The 
newline and the inline box with spaces are put in that table cell.

The next </SPAN> is the end of the table.

The </PRE> then ends the block box.

So, schematically, we end up with:

    11
    2222
    333333some text4
                   4444
    5
    55

11 = an anonymous inline with the first two spaces of the PRE.
2222 = four spaces in the 1st cell of the 1st row of the table.
333333 = six spaces in the 1st cell of the 2nd row of the table.
4 4444 = a newline and four spaces in the 3rd cell of the 2nd row.
5 55 = a newline and two spaces in the 1st cell of the 3rd row.

Or with rules to show the structure of the table more clearly:

    11
   +------+---------+----+
   |2222  |         |    |
   +------+---------+----+
   |333333|some text|4   |
   |      |         |4444|
   +------+---------+----+
   |5     |         |    |
   |55    |         |    |
   +------+---------+----+

It seems Opera only partially implements the SGML rule for suppressing 
newlines, so it has more newlines than expected (it only suppresses one 
of the four, it seems), but otherwise it appears to be correct.

Konqueror and Firefox seem to implement "rule 4½".

>
> >> Is there a space between the "AAA" and "BBB" or not?
> >
> > We certainly *want* there to be a space...
>
> Perhaps so.  There is one in Opera but not in Safari or Firefox, for
> what it's worth...
>
> > The way I would process this example is as follows:
> >
> > We see the DIV and open a table row box.
> >
> > Next we see some white space. We are not preserving white space, so
> > it is not contributing any content; it's just mark-up to separate
> > words and as we haven't seen any words yet, we can simply ignore
> > it.
>
> Uh, no.  First of all, having the behavior depend on the white-space
> value is not acceptable;

Isn't that what 'white-space' is *supposed* to do?

> see above and earlier in this thread.  
> Second, I don't think we need English-centric concepts like "separate
> words" here.  I carefully wrote the text in such a way that the
> whitespace is between tags, not words.  If the language were Chinese,
> it would be separating nothing, but simply providing a way to
> organize the markup for better readability inthe file.

It doesn't matter whether the space is before or after a tag and it also 
doesn't matter whether the non-space characters are Chinese or Greek, 
the space collapsing rules are always the same. The spaces between the 
two SPANs are treated as if they were a single space, but are not 
ignored. If you are typesetting Chinese and you want to have spaces in 
the source but not in the rendering, you need to use the 
proposed 'discard' keyword from the Text module. (It's slightly 
different for newlines: they turn into zero-width spaces after 
Chinese characters. At least, that's what the draft for CSS3 says; in 
CSS 2.1 it's only a "may".)

>
> In any case, the current rules do in fact require that the box the
> text generates, if any, be wrapped in some other boxes as specified
> in this algorithm.  The question is whether the text generates a box
> at all.
>
> > The SPAN itself poses no particular problem, but at the end we
> > encounter white space again. We are still not preserving spaces,
> > but we did just see some inline stuff, so this white space marks
> > the end of a word.
>
> See above.  This argument doesn't hold, in my opinion.
>
> > We don't know yet if it adds a word space to the rendering; that
> > depends on whether there is anything more.
>
> This introduces issues with lookahead similar to what I just posted
> about, no?

Chapter 16 describes it in more detail. There is no lookahead involved 
in *parsing*, but when *rendering* a line you obviously need to 
consider a whole line at once, if not a whole paragraph. The space (or 
whatever object you have turned it into in the in-memory representation 
of the document) is either rendered as empty space (the width of which 
depends on the justification of the current line and the 'word-spacing' 
property), or not at all, if it happens to be at the beginning or end 
of a line.

>
> > If, as in the original example at the start of this thread, you
> > set 'white-space: pre' on the DIV or an ancestor, then white space
> > in the source doesn't serve as mark-up to separate words, but
> > constitutes text of its own. In that case there will still be one
> > anonymous table cell in the table row, but it will contain some
> > additional, anonymous, inline elements before, after and in between
> > the two SPANs.
>
> Not acceptable.  See above.
>
> > The rules are meant to allow
> >
> >     <ul style="display: table; width: 100%">
> >       <li style="display: table-cell">item 1
> >       <li style="display: table-cell">item 2
> >       <li style="display: table-cell">item 3
> >     </ul>
> >
> > to render as
> >
> >     +----------------+----------------+----------------+
> >
> >     | item 1         | item 2         | item 3         |
> >
> >     +----------------+----------------+----------------+
>
> Yes, I know what the rules are meant to allow.  Unfortunately, they
> are not, as written, compatible with the way tables work in HTML. 
> That needs to be a higher priority than allowing the above.

The rules don't apply to HTML table elements. Or rather: they apply, but 
their if-clauses are always false. They apply when there are missing 
elements, which is never the case in HTML tables. Section 17.2.1 even 
mentions this fact explicitly.

Also, the rendering that some browsers used before CSS is not 
reproduceable in the CSS model. Maybe over time we'll find some 
solution, as browsers evolve and treat TABLE elements more logically 
and maybe CSS3 adds a new value to the 'table-layout' property. But for 
CSS 2.1 the working group decided that the state of current browsers is 
such that testing TABLE elements is not worth our time. Section 17.2 
says it like this: "User agents may ignore these 'display' property 
values for HTML table elements, since HTML tables may be rendered using 
other algorithms intended for backwards compatible rendering. However, 
this is not meant to discourage the use of 'display: table' on other, 
non-table elements in HTML."

So for CSS 2.1 we'll need the example with UL/LI to work, but we won't 
test any TABLE elements.

>
> Given the constraint of compatibility with HTML, the question is how
> much complexity the rules are willing to admit before we decide that
> this use case is not worth it.
>
> > because, especially in XML, there are often not enough real
> > elements and you need anonymous ones to make tables. (This doesn't
> > handle all cases that you might want to turn into tables, you'll
> > also need XSLT or the Template Layout module, but it's still
> > useful.)
>
> I'm not saying it's not.  I'm questioning the ratio of usefulness to
> complexity.
>
> Let me put numbers to this.  Nearly half of the code that constructs
> the box tree and manages dynamic updates of various sorts in Gecko
> exists solely to implement this one section of the specification.  It
> doesn't even do so correctly (e.g. it's disastrous at handling
> dynamic mutations).  Making it do that correctly will involve not
> just more complexity but additional performance penalties in some
> cases.  The code is ugly and sin and could probably be shorter and
> simpler, but this section is a _huge_ source of complexity in box
> tree construction.  And the spec isn't even correct; see above.
>
> Is the use case really worth it?  Or would the other approaches you
> describe be better suited?  How common is the use case?
>
> It seems to me that the feature was added as a "that would be cool"
> thing without really thinking through the implications (heck, the
> fact that it's not compatible with HTML shows that), and that no one
> ever di the sort of use case analysis that's nowadays part of, say,
> the HTML working group process...

I can assure you a lot of thought went into the rules for anonymous 
table boxes. Nevertheless, many common cases of (XML) mark-up couldn't 
be fitted into the model and were postponed to what was then still 
called the Frame-based Layout draft (now turned into Template Layout), 
or left to a transformation language such as XSLT.

Many cases where people now are forced to use floats were meant to be 
handled by tables. (And I do handle them with tables, even if that 
means the rendering in IE is not optimal, because in other browsers it 
looks better than floats.)

A rule we applied in the design was that content should always be 
rendered (except when it was hidden with 'display: none', of course) 
and so any strange sequence of 'display' values should yield a useful 
result. That's why a 'block' inside an 'inline' isn't ignored, and 
ditto for 'inline' inside 'table'.

One difficulty that we never solved was how to render a DL as a table 
with two columns: DT on the left and DD on the right. (We lamented that 
HTML never got the DI element that was once proposed to group DT and DD 
elements together; like BODY, DI didn't actually have to be typed, but 
it would still be present in the tree.) The 'display: compact' solution 
isn't exactly the same, although it is useful, too.

>
> > If I understand you correctly, you want the above to not render at
> > all!
>
> As a simple option, yes.
>
> > We should no doubt have an easier way to center things vertically
> > in CSS3 ('block-foo-align: middle' or 'margin: stretch', see
> > http://www.w3.org/Style/CSS/Tracker/actions/18), but meanwhile
> > people use table cells.
>
> Except table cell display types are not interoperably implemented in
> UAs (e.g. IE before IE8 last I checked), so they can't use them
> anyway.

Many things don't work yet in some browser or other. And no style works 
at all if you use Lynx. Likewise, colors are lost if you print on a 
laserprinter. And hypertext doesn't work then either. That isn't 
necessarily a problem, or necessarily a big enough problem to give up 
on nice features in the browsers that it does work in.

>
> >> Interoperability on this section is already quite poor, so making
> >> this change might actually get us to CR faster.
> >
> > The examples above actually work fine.
>
> Yes, any simple example works fine.  Want me to write you some
> slightly more complicated ones that don't?  My very first attempt
> over here with "white-space:pre" wasn't interoperable between Opera,
> Gecko, and Webkit (Opera did one thing, the others did something
> else), to say nothing of IE.
>
> In any case, I'm happy if the spec spells out exactly what simple
> cases we do want to support and EXACTLY how they should work, as long
> as this is compatible with how tables work in HTML.  But the current
> state of things is not acceptable.

It has never been a goal to describe what Netscape 4 did with 
HTML tables, let alone what it did with invalid tables. Not only didn't 
we believe we had the time to do it, we never even considered it 
useful. For a while that was implicit, but the spec now says it 
explicitly.

The CSS3 Table module will contain the legacy algorithm for computing 
table widths. Few people will be able to check if it is correct, so if 
you can help, that would be great.

There is also research going on at some universities into efficient 
algorithms that give better layouts. The results are promising and 
maybe they will be stable in time to be put in CSS3 as well. Then we'll 
have something like

    'table-layout: auto'
        The UAs choice of automatic width computation. The spec gives
        just a small set of constraint the algorithm must satisfy.

    'table-layout: fixed'
        The simple layout that gives each column the same width or the
        width explicitly set by the designer. It doesn't require laying
        out any content in order to compute the width.

    'table-layout: legacy'
        The Netscape 4 legacy. Mostly included for developers of new UAs
        that still want to mimic the old browsers on old content.

    'table-layout: some-name-yet-to-be-chosen'
        An efficient automatic algorithm that balances column
        widths consistently better than the legacy algorithm to create
        tables of near-minimal height.



Bert
-- 
  Bert Bos                                ( W 3 C ) http://www.w3.org/
  http://www.w3.org/people/bos                               W3C/ERCIM
  bert@w3.org                             2004 Rt des Lucioles / BP 93
  +33 (0)4 92 38 76 92            06902 Sophia Antipolis Cedex, France

Received on Friday, 23 January 2009 19:42:11 UTC