W3C home > Mailing lists > Public > public-html@w3.org > August 2007

Re: Heuristic Tests for Data Tables (Discussion)

From: Ben 'Cerbera' Millard <cerbera@projectcerbera.com>
Date: Fri, 24 Aug 2007 19:20:42 +0100
Message-ID: <000f01c7e67b$7d3e4fb0$0201a8c0@ben9xr3up2lv7v>
To: "Philip Taylor" <philip@zaynar.demon.co.uk>
Cc: "HTMLWG" <public-html@w3.org>

Philip Taylor wrote:
>With the data I collected a while ago [1], [...] it seems as important to 
>determine layout vs data for tables that do have <th> as much as for those 
>that don't.

Indeed. And it seems my fears about <td>-only data tables are worse than I 
thought. Of the 33 tables I have[ collected] from the web, only 13 used <th> 
at all. It was rare for row headers to use <th>; only 6 did that.

[collected] <http://sitesurgeon.co.uk/tables/>

All the sports tables I looked at on ESPN's website are <td>-only. A sample 
of them:

1. <http://sports.espn.go.com/mlb/stats/aggregate?statType=fielding&group=9>
    Header cells with same colspan="" value overwrite each other.
    The boldness of the heading text is applied from an external stylesheet 
via class="colhead" on the parent <tr>. So a simple heuristic like <td><b> = 
<th> wouldn't work here. At least there is a clear migration path: swap 
<td>s in <tr>s with class="colheader" to <th>s...but why didn't they just 
use <th> to start with?
2. <http://sports.espn.go.com/rpm/results?seriesId=8>
    Borderline layout table.
3. <http://sports.espn.go.com/rpm/schedule?seriesId=1>
    Borderline layout table.
4. <http://sports.espn.go.com/golf/players/profile?playerId=462>
    Some cells seem to have too much information in them...not really a 
"cell" of data when there are several values about different things.
5. <http://sports.espn.go.com/nhl/boxscore?gameId=270519002>
    Mixed of layout effects around data cells at the start; bonafide layout 
tables; regular number-heavy data tables with spanned headers where simple 
format sniffing looks like it would work.
6. <http://sports.espn.go.com/golf/statistics?sort=officialAmount>
    Quite regular data table with header overwriting similar to #1. First 
cell spans all columns and would be correct if implied as a <caption>. Basic 
format sniffing looks quite promising except for the "Player" column, where 
it would need to check for presence of markup (name <a href>).

Yeah, you could spend all day every day for months investigating the tables 
on ESPN's site. :-)

Eurosport's website became a part of Yahoo! this year:

Tables on Eurosport are a very mixed bag. A sample of them:

1. <http://eurosport.yahoo.com/mo/standing/500/index.html>
    Headers use <th> and are in regular positions but summary="" contains 
   <th> for headers in regular positions.
    <tr> with a single <td> which spans the whole width, splitting the data 
table into groups but only contains &nbsp;. No practical benefit to imply 
<tbody> for things like that?
    A hypen (-) fills some empty data cells near the bottom. <td>&nbsp; and 
<td>- perhaps mean the same as an <td>?
    At the top, there is a layout table whcih contains a calendar which is 
marked up as a data table with day names using <th>. Being able to determine 
layout versus data table in a nested context would be necessary here.
    The table summary="" makes sense but seems more like a <caption>. 
Regular position for headers marked up as <th>. The immediately preceeding 
<h2> ("Table - Full Standing") could be implied as the <caption> for the 
4. <http://eurosport.yahoo.com/cr/sc/13550.html>
    Pairs of tables presented side by side without layout tables (!). 
<caption> is supplied for each table and contains sensible text.
    Column headers use <th scope="col"> and are in regular positions.
    Row headers use <th scope="row"> and are in regular positions.
    Some data legitimately uses <td rowspan> across the entire data area.
    One <ul> is used to supplement each Bowling table even though it's data 
seems columnar. (The first number doesn't make much sense without a "Ball" 
    Lots of unexpanded abbreviations.
5. <http://uk.messages.eurosport.yahoo.com/yahoo/Cricket/Teams/index.html>
    Category listing for a message board. The summary="" merely repeats the 
preceeding <h2>.
    <th> used for column headers, positioned in regular locations.
    (Message boards are a rich vein of layout tables and borderline data 
tables which could have a whole study to themselves.)

One particularly interesting collection are the tables on the official site 
of the Intercontinental Rally Challenge (IRC):

1. <http://www.ircseries.com/html/Standings_Results.asp>
    Column headers use <td align="center">. Perhaps this could be an alias 
for <th>?
    Main column headers use rotated text embedded in images with no alt="" 
    Column sub-headers use flag icons with no alt="" text.
    Note that the main column headers do not span the sub column headers. 
This is rare, in my experience, but evidently it does exist. If a <th> is 
immediately preceeded by a <th> of the same colspan="" value, they must be 
added together to support tables like this.
    Tables have actually been nested inside each other and placed 
immediately after each other to produce some of the data tables on this 
page. I have never seen that before now.
    The data for some tables is available as Excel spreadsheets. Maybe TV 
Raman could use those in Emacspeak! :-P
2. <http://www.ircseries.com/html/Calendar.asp>
    Column headers in regular positions using <td><strong>. Perhaps that 
should be an alias for <th>? (If you consider <strong> to be an alias for 
<b> and <td><b> to be an alias for <th>, <td><strong> as an alias for <th> 
follows. It also stands alone thanks to this use case, imho.)

I am not the first to investigate an aspect of their accessibility. The 
American Federation for the Blind checked ESPN among others in the 2005/2006 
season. They mentioned problems with data tables on the NFL site:


Sports are very mainstream, with sporting events being some of the biggest 
in the world. For example, the Olympics. Or the 10,000+ spectators in many 
sports stadiums around the world every weekend for either American Football, 
baseball, motor racing (especially NASCAR in the USA and Formula One around 
the world), soccer (especially in Europe), etc.

AFB's website proves vision impaired people are interested in mainstream 
things like sports (and why wouldn't they be?). These tables could be more 
useful to them if they were more accessible.

I wonder how many tables can be made natively accessible? How many will need 
to be retrofitted by authors with <th> or scope="" or headers="" and how 
likely is that? I guess more studying (like Philip and I and others have 
done) and prototyping of implementations (like James Graham might do) and 
testing in screen readers (like Steve Faulkner and others have done) and 
talking with content authors (like I've done and several are here themselves 
in HTMLWG) will help answer those questions over the coming years.

Anyone can do research and testing into this and other things whenever they 
can find the time and motivation [1]. :-)

[1] <http://lists.w3.org/Archives/Public/public-html/2007Aug/0968.html>

Ben 'Cerbera' Millard
Collections of Interesting Data Tables
Received on Friday, 24 August 2007 18:21:19 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Wednesday, 9 May 2012 00:16:04 GMT