Re: Improving the Header Relationship Algorithm (Discussion)

Hi Ben,

I would say I agree with many of the points you make here. I'll  
elaborate below.

On Aug 13, 2007, at 11:04 AM, Ben 'Cerbera' Millard wrote:

> As I understand it, HTML5's heading association algorithm [1] was  
> designed to work with the most common data tables known at the time  
> [2].

 From the looks of it, the algorithm is designed to handle only the  
simplest of data tables: those may be the most common, or they may  
not. We have no scientific research on that (nor do I think such  
research would be all that useful).

> There is ongoing research from various Participants into gathering  
> tables from the web to understand the ways authors are using them.  
> Similar work is ongoing for how ATs enable users to interact with  
> these tables (and other HTML structures).
>
> My initial thoughts from the tables I've seen (and the small  
> proportion I have dissected in detail):
>
> 1. It's very common for data tables to have one or more rows of  
> headers across the top of the table:
>   a. 1 or 2 rows are both common.
>   b. 3 rows happens less often but still enough to think about.
>   c. More than 3 rows seems rare, although it does exist.
>   d. When more than 1 row of column headers are used, headers in  
> the higher rows tend to span several of the columns in the lower rows.

I have witnessed many of the same table features. I note too that the  
spanning of multiple columns or rows creates a new meaning of scope  
where the scope  keywords have some ambiguity. For example when a TH  
cell spans 3 columns without scope set its scope should be its own  
row and all three columns. With column set, it should be all three  
columns (IMO). With rowgroup set, it should be the rowgroup for the  
three columns. When colgroup is set, it's unclear. Such a setting  
should probably be discouraged by the draft. Both in HTML 4.01 and  
HTML5 there is insufficient prose to fully describe how these  
features work.

> 2. Row headers are done in various ways:
>   a. Commonly, they are given a column header using <th> and are  
> themselves using <th>.

Meaning the row header points to a column header above it or  
implicitly falls under the column header above it right?

>   b. About as commonly as 2a, they use plain <td> and are given a  
> column header using <th>.

This is where the TD acts as both a data cell and a header cell. This  
is usually indicated by setting @scope on the TD cell (something  
dropped in HTML5). I have also suggested a boolean attribute to  
indicate when a TD is acting as a TH (something like @heading).

>   c. Occassionally there is an empty <th> or <td> at the top of the  
> row headers column. The row headers then use <th>.

Is this  simply a restatement of the (a) and (b) (at least the second  
phrase in each sentence)?

> 3. Data tables are sometimes found inside layout tables.
> 4. Data cells which are logically empty usually contain &nbsp; or  
> some other placeholder which isn't really data.
> 5. Authors seem to use print media conventions when producing their  
> tables. For example, <th colspan> which must replace any <th  
> colspan> of the same width which occurs above it if there are <td>  
> cells in between them.

I'm not clear what you're saying here. Are you saying that authors  
repeat the TH colspan at the bottom of the table or intermittently  
within the table to keep the header in view?

> 6. The HTML4 algorithm [3] is rather vague but seems to handle a  
> lot of cases.

I agree with that. The HTML5 algorithm should try to elaborate on the  
HTML 4.01 algorithm with some improvements. For example, I think it  
should incorporate THEAD elements as global headings (probably all  
rows). Even without scope, this allows authors to provide column  
headings for cells of two levels: local and global. Anything beyond  
those two levels, would require the use of @headers (or would be  
prohibited by the language).

> Do these observations match what other Participants see on the web?  
> Is it OK if really strange cell arrangements which haven't provided  
> scope="" or headers="" remain hard to use? Could those print media  
> conventions be detected automatically?

Again, I'm not clear what you mean in those print media conventions.  
Could you elaborate? Except fo r a few exceptions, I think that  
without strange cell arrangements neither @headers nor @scope should  
be needed. Those exceptions are: 1) for backwards compatibility with  
AT; 2) except in the case where data cells had headers beyond simply  
two levels (local and global headers); and 3) to scope corner cells  
to either the row or the column (by corner cells I mean cells buried  
behind other headers to the right and below where no data  cells  
appear in either the row or the column).

>
> The aim of all this is for HTMLWG to produce a better table headers  
> association thingy. But it's a complex subject. Let's work together  
> to document the problems authors face before we carve anything in  
> stone. :-)

Agreed.

Take care,
Rob

Received on Monday, 13 August 2007 21:13:56 UTC