W3C home > Mailing lists > Public > public-html@w3.org > August 2007

Improving the Header Relationship Algorithm (Discussion)

From: Ben 'Cerbera' Millard <cerbera@projectcerbera.com>
Date: Mon, 13 Aug 2007 17:04:29 +0100
Message-ID: <018601c7ddc3$a2df68a0$0201a8c0@ben9xr3up2lv7v>
To: "HTMLWG" <public-html@w3.org>

As I understand it, HTML5's heading association algorithm [1] was designed 
to work with the most common data tables known at the time [2].

There is ongoing research from various Participants into gathering tables 
from the web to understand the ways authors are using them. Similar work is 
ongoing for how ATs enable users to interact with these tables (and other 
HTML structures).

My initial thoughts from the tables I've seen (and the small proportion I 
have dissected in detail):

1. It's very common for data tables to have one or more rows of headers 
across the top of the table:
   a. 1 or 2 rows are both common.
   b. 3 rows happens less often but still enough to think about.
   c. More than 3 rows seems rare, although it does exist.
   d. When more than 1 row of column headers are used, headers in the higher 
rows tend to span several of the columns in the lower rows.
2. Row headers are done in various ways:
   a. Commonly, they are given a column header using <th> and are themselves 
using <th>.
   b. About as commonly as 2a, they use plain <td> and are given a column 
header using <th>.
   c. Occassionally there is an empty <th> or <td> at the top of the row 
headers column. The row headers then use <th>.
3. Data tables are sometimes found inside layout tables.
4. Data cells which are logically empty usually contain &nbsp; or some other 
placeholder which isn't really data.
5. Authors seem to use print media conventions when producing their tables. 
For example, <th colspan> which must replace any <th colspan> of the same 
width which occurs above it if there are <td> cells in between them.
6. The HTML4 algorithm [3] is rather vague but seems to handle a lot of 
cases.

Do these observations match what other Participants see on the web? Is it OK 
if really strange cell arrangements which haven't provided scope="" or 
headers="" remain hard to use? Could those print media conventions be 
detected automatically?

The aim of all this is for HTMLWG to produce a better table headers 
association thingy. But it's a complex subject. Let's work together to 
document the problems authors face before we carve anything in stone. :-)

[1] 
<http://www.whatwg.org/specs/web-apps/current-work/multipage/section-tabular.html#header-and-data-cell-semantics>
[2] <http://lists.w3.org/Archives/Public/public-html/2007Jun/1039.html>
[3] <http://www.w3.org/TR/html4/struct/tables.html#idx-table-19>

--
Ben 'Cerbera' Millard
Collections of Interesting Data Tables
<http://sitesurgeon.co.uk/tables/readme.html> 
Received on Monday, 13 August 2007 16:16:24 UTC

This archive was generated by hypermail 2.3.1 : Monday, 29 September 2014 09:38:48 UTC