Re: Improving the Header Relationship Algorithm (Discussion)

HI Ben,

On Aug 14, 2007, at 11:55 AM, Ben 'Cerbera' Millard wrote:

> Robert, thanks for your technical review. To clarify some points:
>
> Robert Burns wrote:
>> For example when a TH  cell spans 3 columns without scope set its  
>> scope should be its own  row and all three columns.
>
> I would replace "and" with "or" here. In tables I've seen each  
> header is for either a column or a row.

Yeah, that's probably a better approach. The problem is that I think  
we need to demand a certain level of authoring accuracy if this is  
going to work in any sane way. In my review of tabular data, I  
suggested adding a @heading boolean attribute that would help with  
this. That way a table cell can be a data cell (a TD element) and not  
need to use either @scope or @headers to declare itself also a  
heading for other cells. In many ways, I think TD@heading='heading'  
could be treated as the same as TH, except it would also have data in  
it.


> Robert Burns wrote:
>> With column set, it should be all three columns (IMO).
>
> That's an interesting idea. When I first started making tables I  
> assumed scope="col" would be smart about colspan="". Other authors  
> are using in this way:
>
> <http://www.moneyextra.com/stocks/ftse100/>
>
> This change would also remove the need to have <colgroup> elements,  
> which I personally get wrong quite a lot!

COLGROUP elements can also provide a styling hook. I'm just not too  
sure that there should be a @scope keyword that relates to the groups  
(row and column). I think it simply spanning the column or rows  
indicates the heading should also span those columns or rows (as it  
does in the HTML 4.01 basic table algorithm).

> We should analyse more existing tables to make sure changes like  
> this would not break more tables than they fix, though.

I agree. That's why I think at some point we have to abstract from  
all of the tables we've seen; put together an prototype table for  
that abstraction; and then return to the wild to see if we can find  
more exceptions to that prototype[1].

> Robert Burns wrote:
>> When colgroup is set, it's unclear
>
> That applies the header to all the columns in the current column  
> group from that row downwards:
>
> * If the column group is defined as 2 columns wide the header  
> wouldn't apply to the third column even though the header spans  
> into it.
> * If the column group is defined as 4 columns wide the header will  
> apply to the 4th column, even though the header does not span into it.
>
> I can't recall a page using scope="colgroup" for a <colgroup> whose  
> width was different from the colspan="" on the header. They might  
> well exist, though

Yeah, I understand what it's supposed to do, but it fails to convey  
visually what it says with the @scope attribute set to 'colgroup'. I  
think that's a big problem. In other words, by avoiding the  
'colgroup' and 'rowgroup' keywords and using the actual column or  
group span (like the HTML 4.01 basic table algorithm), we avoid some  
confusion and I think we avoid an author setting an explicit  
association that doesn't match the visual implications of the table.


> Robert Burns wrote:
>> Ben 'Cerbera' Millard wrote:
>>> 2. Row headers are done in various ways:
>>>   a. Commonly, they are given a column header using <th> and are  
>>> themselves using <th>.
>>
>> Meaning the row header points to a column header above it or   
>> implicitly falls under the column header above it right?
>
> Right.

Good, we're on the same page then.


> Robert Burns wrote:
>> Ben 'Cerbera' Millard wrote:
>>>   b. About as commonly as 2a, they use plain <td> and are given  
>>> a  column header using <th>.
>>
>> This is where the TD acts as both a data cell and a header cell.  
>> This  is usually indicated by setting @scope on the TD cell  
>> (something  dropped in HTML5). I have also suggested a boolean  
>> attribute to  indicate when a TD is acting as a TH (something like  
>> @heading).
>
> Using scope="row" for these is not common in the tables I've seen  
> (although it's what I use for this case). The user has to guess  
> from the context of the table, which I usually find quite easy.

Right, visually one can glean the meaning from the context. But for  
an aural user, that's not possible. So setting either @headers (even  
@headers='') or @scope on the left-most data cell notifies the UA  
that this is also a header cell. Again, that's the use-case for my  
suggestion of adding a @heading boolean attribute. The author will  
not feel the need to set any  other value, the scope='row' would make  
sense there.

> Robert Burns wrote:
>> Ben 'Cerbera' Millard wrote:
>>>   c. Occassionally there is an empty <th> or <td> at the top of  
>>> the  row headers column. The row headers then use <th>.
>>
>> Is this  simply a restatement of the (a) and (b) (at least the  
>> second phrase in each sentence)?
>
> Sort of. In (c) the cell above all the row headers is empty and  
> tends to be a <td> rather than a <th>. In (a) and (b) this cell  
> always has content and is always a <th>.

I see. I wasn't understanding that. That cell you're referring to is  
the type of cell I've been calling a corner cell. It is often left  
blank, but when it is not it often needs to be scoped to either the  
row or the column. Sometimes a table will provide multiple corner  
cells or somehow split the corner cell to provide one scoped each to  
the column and the row.

> [various example tables...]
>
> Robert Burns wrote:
>> Ben 'Cerbera' Millard wrote:
>>> 5. Authors seem to use print media conventions when producing  
>>> their tables. For example, <th colspan> which must replace any  
>>> <th  colspan> of the same width which occurs above it if there  
>>> are <td>  cells in between them.
>>
>> I'm not clear what you're saying here. Are you saying that  
>> authors  repeat the TH colspan at the bottom of the table or  
>> intermittently  within the table to keep the header in view?
>> [...]
>> Ben 'Cerbera' Millard wrote:
>>> Could those print media conventions be detected automatically?
>> Again, I'm not clear what you mean in those print media conventions.
>
> Sort of the latter. For example:
>
> * <th colspan> is used intermittently to split the table into  
> yearly groups. The <th colspan> for 1993 replaces the <th colspan>  
> for 1992 in that group, but the <th> cells for "d", "m", and "y"  
> must still apply to it (this was a simulation of converting an  
> ASCII table to HTML; so you might ignore it as being non-genuine):
>   <http://sitesurgeon.co.uk/tables/astro/06-seasons/minimal- 
> colspan.html>
> * <th colspan scope="rowgroup"> used for the first group, "Document  
> structure". After that <td colspan> is used for each group.  
> (Perhaps <td> whose colspan="" is the entire width of the table  
> should be an alias for <th> with that same colspan="" value?)
>   <http://keryx.se/resources/html-elements.xhtml>
> * <td colspan><b> is used intermittently to split the table into  
> mode groups. (Perhaps a <td> whose only child is <b> should be an  
> alias for <th>?)
>  <http://sitesurgeon.co.uk/tables/clark2006/11-controls/original.html>
> * <td><span><em> used in first column to imply groups in the table.  
> Subsequent columns in that row using <td>---</td> instead of  
> perfectly empty cells. (Is that too much data to be considered an  
> "empty cell"?)
>   <http://php.net/manual/en/function.date.php#id2654618>
>
> All of these are fairly intuitive if you saw them on a computer  
> screen. They look much like the sorts of things we see printed on  
> paper. I'm wondering how feasible it is to infer meaningful  
> structure from these presentational conventions. There is *some*  
> method to the madness in these tables...if you squint a bit. :-)

No, I can follow those. Many of those are making use of what I'm  
calling global headers or global headings (instead of local TH  
elements in the table body). This to me is the semantics of the THEAD  
element,, that we should clarify in HTML5. When presented properly  
this THEAD is repeated again and again on each page, or displayed in  
a fixed manner with the table bodies scrolling below. So by inserting  
the THEAD into the header/data association algorithm, we provide  
authors we a two tiered approach (that many already use). These  
global headers would always be associated with every data cell  
beneath it. UAs could even provide the ability to repeat or not  
repeat the global headers separate from the local headers.


> Robert Burns wrote:
>> Except fo r a few exceptions, I think that without strange cell  
>> arrangements neither @headers nor @scope should  be needed.  [...]
>
> Any specific examples of tables you've found would be welcome. Feel  
> free to create your own table collection. That avoids me causing a  
> bottleneck in the gathering of real tables.

I will certainly be on the lookout for interesting tables. I  
appreciate all the work you've don on this. We all ow you a big  
thanks for this.

Take care,
Rob

[1]: Again, I think its' helpful to consider the prototype table in  
discussing the real-world examples.

Received on Tuesday, 14 August 2007 19:44:11 UTC