Re: Heuristic Tests for Data Tables (Discussion)

On Wed, 05 Sep 2007 12:20:14 +0200, Ben 'Cerbera' Millard  
<cerbera@projectcerbera.com> wrote:

> This is really valuable work, James.

Indeed.

> [...]
>
> Simon 'zcorpan' Pieters and I have ideas about a good implicit  
> algorithm, parts of which I've written about generally.

For everyone's benefit our discussions are now available here:

    http://simon.html5.org/discussions/table-header-algorithm

> [...]
>
> My rough attempt at writing the steps for this goes column by column,  
> from the first cell downwards through all subsequent cells, collecting  
> and applying header associations on the way. HTML4 describes searching  
> up from each <td> until reaching a <th>. You can write it for either  
> direction, but I find this way a bit easier to follow.
>
> For each column in the table:
>
> 1. Collect the <th> and go down one cell.
> 2. If this is another <th>, associate them regardless of colspan. Go  
> down one cell.
> 3. Repeat 2 until you reach a <td>.
> 4. Associate all the <th>s so far with the new <td>. Go down one cell.
> 5. Repeat 4 until you reach another <th>.
> 6. For each <th> collected so far:
>     a. Check its colspan with that of the new <th>
>          i. If they are different, associate them.
>         ii. If they are identical (including if they are both  
> colspan=1), replace the collected <th> with the new <th>.
>     b. Go down one cell.
> 7. Repeat 6 until you reach a <td>.
> 8. Repeat 4-7 until the end of the table.
>
> This assumes there are no scope or headers attributes and the topmost  
> cell of the column is a <th> (including aliases <td><b> or <td><strong>).

This seems to work pretty well, doing a few tests with James' app.

For it to work nice together with the other features and different tables  
it would probably have to:

   * Skip past any cells at the top until you find a header cell.
   * Ignore header cells that have a scope attribute.
   * Ignore header cells that don't have any "significant content".

"Significant content" could be e.g. anything other than whitespace  
(including U+00A0). Sometimes you see empty THs that aren't intended to be  
an actual header cell but merely an empty slot in the table as a way to  
get the other cells in the right column.

One way the algorithm could be specced to take into account all of the  
headers, scope and implicit associations, is as a several-pass through the  
table, as follows:

   1. For each cell in the table that has a headers attribute, split the
      value on spaces and for each token, if getElementById returns a TH or
      TD element that is part of the same table and isn't the same cell,
      associate them together.
   2. For each TH element in the table that has a scope attribute, use the
      algorithm in HTML5, ignoring the auto state.
   3. Apply the smart colspan algorithm above.
   4. Apply the smart rowspan algorithm.

Any duplicate associations would be ignored.

With regards to the headers attribute, if we want to support it at all, I  
wonder if it should be absolute or augmentive. I can see advantages both  
ways -- augmentive is better because it makes headers="" more useful as a  
way to patch the implicit algorithm where necessary, but absolute allows  
to override the implicit algorithm which might be needed for e.g. overlaid  
tables. The algorithm above is augmentive but it would be straightforward  
to make it absolute; just make step one flag cells and let the rest of the  
steps ignore flagged cells.

> [...]

-- 
Simon Pieters
Opera Software

Received on Friday, 7 September 2007 10:39:55 UTC