Collections of Interesting Data Tables
Genuine data tables found on the web which seem complex or otherwise noteworthy.
This research is often inactive due to professional commitments. It began on 19th May 2007 and was updated on 6th November 2007.
Feedback is welcome.
Numbers
- 96 genuine tables.
- 150 variants to simulate retrofitting techniques.
- 27 more tables being reviewed.
How Authors Indicate Headers
- 15% use no HTML table elements.
- 1% have no header cells but are data tables.
- Using
<th>
:
- 20% for all headers.
- 18% for some headers.
- 45% for no headers.
- (Equals 99% with the first two groups due to rounding.)
- Using
<td><b>
or <td><strong>
:
- 5% for all headers.
- 11% for some headers.
- 70% for no headers.
- (Equals 102% with the first two groups due to rounding.)
- Using
<td>
:
- 35% for all headers.
- 20% for some headers.
- 31% for no headers.
- (Equals 102% with the first two groups due to rounding.)
Collections
Simulated Retrofitting
Investigating how tables might be adapted to become more accessible whilst keeping their meaning. Check the Method for Retrofitting Simulations.
astro
:
- 6 genuine tables from the U.S. Naval Observatory’s Astronomical Applications Department Data Services.
clark2006
:
- 19 genuine tables from Joe Clark’s Table examples for PDF/UA 1 (2006.01.27). (PDF/UA.)
finance
:
- 2 genuine tables about money, with notes in the next section..
form
:
- 1 genuine table with forms controls in it.
odi
:
- 7 genuine tables from Office for Disability Issues (ODI) research, New Zealand.
thatcher
:
- 2 genuine tables examples from USA government, sent to me by Jim Thatcher.
sports
:
- 1 genuine table, with notes in the next section.
tides
:
- 1 genuine Gorleston tide table, UK Broads Authority.
My Bookmarks
From browsing of the web, including deliberate searches for interesting tables. I biased the search towards the more popular websites for any given query.
When I say “Make a variant” or “E-mail them” I am inviting anyone to do it. Help spread the workload!
Astronomy
- The Astronomical Almanac from the US Naval Observatory:
-
- Solar system measurements and constants.
- PDF or ASCII. Are HTML tables so hard? E-mail them.
Computing
- APIs Usage in VB6 “FileInfo” Project by Karl E. Peterson:
-
- Col headers use
<th align="left">
. So we can’t rely on align
to tell us if a <th>
is being used for data?
- Row headers use
<td><b>
. So this is used for both types of header.
<br>
instead of rowspan
. Make a variant to simulate retrofitting this.
- Uses the
frames
attribute for controlling borders.
- The Best Gaming Video Cards for the Money: May 2007 from Tom’s Hardware:
-
- Column headers use plain
<th>
.
- Column headers are styled to look identical to data cells. We can’t use visual appearence to tell when
<th>
is being used for data?
- Each cell has a list of 0 or more graphics cards:
- Empty cells (0 cards) use
<td> </td>
.
- Items are separated with a comma.
- Could use
<ul>
and <li>
. Make a variant.
- Could use invididual cells. Make a variant.
- Harmonia GUI Framework by Andrew Fedoniouk:
-
- Headers use plain
<td>
.
- Table has three layers:
- A cell spanning the entire table width indicates the first layer of sections.
- A cell in column 1 indicates the next layer.
- Column 2 indicates layers within those indicates by column 1.
Rowspanning is used in columns 1 and 2 when these inner layers contain more than one row.
- An interesting anomaly is when “Module” and “Class/struct/type declaration” are the same:
- Rather than repeat the same name twice, the module name is spanned into the next column.
- Since it acts as a row header, it should be marked up as a header.
- This might prevent the “smart colspan” algorithm working.
- Make a variant where the name is repeated.
- Make a variant where it uses
<th>
.
- Make a varaint where the first row of column headers use
scope="col"
.
- Make a variant where the first row of column headers are inside a
<thead>
, implying scope="col"
.
- Some cells use
<td><strong>
but are not headers. We cannot imply <td><strong>
is a table header?
- Keryx (X)HTML Elements Best Practice Sheet by Lars Gunther:
-
- Is an XML page:
- Empty cells use
<td />
. Is the same as <td></td>
.
- Column headers use
<th>
.
- Column headers are in a
<thead>
.
- No columnar values of
scope
are used.
- Uses
<colgroup>
but only for controlling borders.
- Row headers use
<th scope="col">
.
- Table is split into sections:
- First section uses
<tbody>
and <th scope="rowspan">
. Make a variant where each section does this.
- The other sections use
<td colspan>
as a header which stretches across the table. Can this be told apart from a wide data row? Make a variant using <th colspan>
.
- Some data cells span columns, some span rows, some span rows and columns.
- Abbreviations are expanded with a mix of punctuation and
<b>
. Make a variant using <abbr title>
. Make a variant using <dfn>
.
- Layout height attributes on body and html elements by Anne van Kesteren:
-
- Uses
<caption>
correctly.
- Column headers use
<th>
.
- Column headers are inside a
<thead>
.
- 3 levels of column headers with different column spans:
- Top level spans the furthest.
- Next level spans less.
- Final level does not span.
- Column span boundaries have a regular alignment.
- HTML4’s
scope
cannot express this table because it would need nested <colgroup>
? Make a variant.
- Browser names use plain
<td>
. But you need these as headers to understand the data? Make a variant.
- Shortened terms expanded after the table. Make a variant using
<abbr title>
.
- Review it against HTML4’s header search algorithm. Ask Leif Halvard Silli to do this?
- Optimize string handling in VB6 - Part II by Tuomas Salste:
-
- Tables as diagrams of memory structures.
- Regular data tables.
- Comparisons.
- Headers usually done with
<th>
. Sometimes done with <td align="center">
.
- Inconsistencies between markup and styling on one page by one author. E-mail them about this.
- Clean, minimal markup on the whole. Maybe authors will be happy to write new tables this simply?
- Linkback by Wikipedia:
-
- Column headers use
<th>
.
- Row headers use plain
<td>
.
- The empty header cell uses
<th></th>
.
- Empty data cells use
<td>None</td>
.
- At least 2 data cells contain a
<ul>
in 3 of the 4 columns. Block-level markup does not indicate a layout table.
- The QA Matrix by W3C QA:
-
- Four distinct columns sharing the “Properties” header (or a list of 4 items, depending how you look at it).
- 0 or 1 lists in the final cell of each row.
- Empty cells marked “-”.
date
Parameters from the PHP Manual:
-
- Column headers use
<th>
.
- Column headers are in a
<thead>
.
- Row headers use
<td><var>
.
- Table is split into sections:
- Section headers use
<td align="center"><span class><em>
.
- Section headers only span the first column. E-mail them about it.
- Other cells in the section header row use
<td>---</td>
. Can this be considered an empty cell?
- Make a variant with
<th>
.
- Make a variant with
<td colspan>
.
- Make a variant with
<th colspan>
.
- Make a variant with
<tbody><th scope="rowgroup">
.
- Make a variant with
<tbody><th colspan scope="rowgroup">
.
- Uses a single
<colgroup>
for the table with a <col>
element for each column. Why? E-mail them about it.
- Supplies a
summary
which repeats the previous paragraph. Why? E-mail them about it.
- The strangeness seems symptomatic of someone who is trying too hard without fully understanding the markup. E-mail them about it.
Education
- A table of worldwide ages of consent, including US states by Avert:
-
<th>
used for column headers.
- Column headers are in the same
<tbody>
as all the data. Didn’t use <thead>
.
<td class>
used for row headers. Why not <th>
? E-mail them.
- The column of row headers has a column header.
- Row headers get 2 layers deep in several places but are never heirarchical.
- Footnotes are numbered in the table and wrapped in
<sup>
which corresponds to a <ol>
later in the page. Perhaps this could be built on to produce a robust footnotes system leverging existing elements for HTML5?
- A row of averages is placed at the bottom in the same row group as the data. Didn’t use
<tfoot>
.
- School Teachers’ Review Body Statistical tables as annex to the 2005 written evidence from the DfES by teachernet:
-
- All done as Excel spreadsheets.
- Some have heirarchical row headers. Did they choose Excel because they couldn’t figure out the HTML to do this? E-mail them.
- Most of these tables are dead simple. So why not use HTML? E-mail them.
- Science and engineering departmental population at doctorate-granting institutions, by field: 1987-94 by the National Science Foundation:
-
- ASCII used in a
<pre>
instead of HTML table elements. E-mail them. Make a variant.
- All their most recent tables are done in Excel and PDF. For example, Graduate Students and Postdoctorates in Science and Engineering: Fall 2005. Is HTML so hard? E-mail them.
- One row of column headers.
- Indented table sections where most rows are 4 levels deep! Are their headers supposed to accumulate? E-mail them.
- If they must accummulate, probably needs the
headers
+id
patch technique.
- Row header text is too long for
rowspan
to be practical? Make a variant.
- Totals and subtotals appear at the start of each top-level table section.
- No column spans or row spans.
- Footnotes appear immediately after the table. This seems to be a strong convention in print, ASCII, HTML and other formats?
Finance
- FTSE 100 Listings from Money Extra with loads more UK stock tables:
-
- Column headers cover two rows.
- Entire headers block gets repeated after every 20 rows of data.
- Uses
scope="col"
, so the scope has to stop after it runs down some data rows and hits another header with the same scope
.
- As
scope="col"
is used in cells with colspan
, to accomodate this table we would need to:
- Maybe it’s too funky to accomodate? Removing the
scope
attributes would be an easy retrofit. Make a variant.
- Departmental financial statements from Disability Services Queensland:
- Uses the same
headers
+id
heirarchical row header patching technique as Stephen Ferg in the USA. E-mail them about any influence.
- FTSE ACT 250 by Yahoo! Finance:
-
<td align="center">
for column headers. Maybe this should be an alias for <th>
in certain situations? Such as in a row which only contains <td align="center">
?
- Some uses of
<td align="center">
and <td>
containing <b>
for different purposes:
- Row headers use
<td><b><a>...</a></b></td>
. So <b>
is the only child of <td>
for these headers.
- Columns 3 and 4 use
<td align="center">
. Need to be very careful if we allow this as an alias for <th>
.
- Column 3 uses
<td><b>...</b> ...</td>
and column 4 uses <td><img> <b>...</b></td>
. So <b>
is not the only child of <td>
for these data cells.
We must be very careful about when things can be interpreted as <th>
?
- Why aren’t they using
<th>
? E-mail them.
- There are 2 layout tables as ancestors of this data table.
- There are no layout tables as descendants of this data table. A data table can only be at the bottom of nested tables?
- University of Wisconsin–Madison Facts: Budget:
-
- Snapshots and taken on 29th September 2007, with retrofitting simulations:
- Attempted to use
headers
+id
but got it wrong:
- Bogus reference to
acprog
in a headers
attribute value.
- Empty string for
headers
values in the “2005-2006 Budget: allocation by program” table from the “Student support” section onwards.
- Tables are captioned with
<caption>
.
- Purpose of table is summarised in the
summary
attribute.
- Column headers use
<th>
.
- Long header text is abbreviated with the
abbr
attribute.
- Two headers use
<td>
.
- Table is split into headed sections.
- Section headers use
<th>
with a colspan which covers the full table width.
- Cell arrangement is regular and doesn’t really need
headers
+id
? Make a variant.
Government
- Bolton Museums - Contact Us:
-
- No column headers. Make a variant with column headers.
- Table begins with a
<th colspan><i>
across the whole width:
- It is the first section header.
- Implying this is a caption would be wrong.
<i>
does not hint at a semantic intention, unlike <b>
.
- Subsequent sections also begin with
<th colspan>
across the whole width.
- Sections end with a row of cells using
<td><br /></td>
. A cell containing a <br>
is an empty cell which should be ignored?
- All sections are in the same
<tbody>
. Make a variant using one <tbody>
for each section.
- Row headers use
<td>
. Make a variant using <th>
.
- Row headers sometimes span more than 1 row.
- The row header spans more than one column when there is no names of people. Make a variant where the person’s name cell is there but is empty.
- People’s names use
<td><b>
. Implying these are headers would be wrong. Make a variant where this boldness is done via CSS.
- Bureau of Labor Statistics, particularly these areas influenced by Stephen Ferg:
-
Minimal
<th>
and <td>
are used. Minimal headers
+id
is added to patch up the HTML4 header search algorithm where needed.
- National Statistics Online (UK)
- It’s all PDF except for commentary and graphs?
- TABLE Z-2 - 1910.1000 TABLE Z-2 from the US Department of Labor, Occupational Safety & Health Administration:
- Try saying that three times quickly.
- Expanded Homicide Data Table 2 from the Federal Bureau of Investigation:
-
- Column headers use
<th>
with scope="col"
.
- They expect
col
to be sensitive to the colspan
. I once thought this, too. Probably unaware of the colgroup
value, which is also rather strange to set up.
<th>
with scope="row"
for row headers, augmented with headers
+id
for the heirarchical row header.
headers
+id
for every cell which has an headers applied to it.
- A unified header algorithm needs to drop duplicate associations caused by the overlapping association methods in tables like this.
- Footnotes in a
<ul>
after the table.
Interactive
- Dog selector test:
- Faking a table for a form.
- Events - Lions Club of Fleet:
-
<td><h3>
for header cells.
- If you could recognise these as headers, you’d need to be smart about
colspan
even through the headers are defaulting to colspan=1
.
- Endnotes are in the final row of the data section.
- Timetables - Isle of Man Steam Packet Company Ferry Services:
-
- Several PDF documents, each of which contains several pages of colour-coded timetables.
- Why did they use PDF? Are HTML tables so much harder? E-mail them.
Products
- Fitting Bras, Correct Bra Size and Comparisons from Bigger Bras:
-
- 2 levels of column headers:
- Column headers use
<td align="center">
.
- Table is split into 2 sections, with column headers for each section.
- 3 levels of row headers:
- Row headers use
<td align="center">
.
- Data cells also use
<td align="center">
.
- Cannot imply
<td align="center">
is a header.
- Some cells legitimately contain two pieces of data.
Sports
Detailed Review
I wrote a detailed review of sports tables which included:
- 6 more from ESPN;
- 5 more from Eurosport;
- and 2 from the Intercontinental Rally Challenge website, including the weirdest approach to coding a table I’ve ever seen.
The AFB reviewed some sports sites in early 2006, finding problems with data tables. Disabled people can be sports fans, if you hadn’t realised. Heard of the Paralympics?
ESPN
None of their tables use <th>
. Their column headers use <td>
with CSS to make it bold! But at least retrofitting <th>
would be easy. E-mail them about it.
Their data tables are usually given a caption by placing a <td colspan>
in the first row which spans all columns in that table. I call this an “embedded caption”. Is it so hard to style <caption>
? Test it.
- NHL Player Card for Daniel Alfredsson:
-
- Abbreviations for headers described by a glossary, which is the next table in this review.
- Embedded caption.
- Data is mostly numerical and layed out regularly. Pretty tame.
- NHL Statistics Glossary:
-
- Embedded caption.
- Only 2 columns of data. A small number of columns does not indicate a layout table.
- No cells acting as column headers.
- First column kind of acts as row headers. Should authors bother with row headers in 2-column tables? The user probably just heard it and it can be heard again by moving one cell left.
- If it were retrofitted with
<th>
, would our algorithms work? Make a variant.
- NHL Boxscore:
-
- 14 tables styled to look like data tables.
- 2 of these are layout tables. Each contain 2 of the other 14 data tables.
- Untitled table showing scores per quarter:
- No caption.
- Row headers include some data.
- Final column uses bold styling applied via CSS to indicate importance.
- Top left cell is completely empty.
- Seems indistinguishable from a layout table.
- Make a variant.
- Three Stars:
- 3 column layout table.
- Multiple details per cell.
- There are no column headers, just an embedded caption.
- Probably won’t hurt if this was erroneously identified as a data table?
- Game Information:
- 2 column layout table.
- Multiple details per cell.
- No column headers, just an embedded caption.
- Team Statistical Comparison:
- Layout table.
- Contains 6 tables in one cell.
- Each of these tables is a diagram and not really a data table.
- Need to see the colours and tell them apart to understand the data.
- Make a variant where these are genuine data tables without depending on colour.
- 1st Period Summary:
- Uses a
<td>
spanning the entire table width using align=center
in sections where there is no data to report. Imply that is a headers would break this table.
- Regular data table with one detail per cell.
- Column headers are repeated.
- Columns 3 and 4 start with individual headers but are replaced by a spanned header. “Smart colspan” wouldn’t recognise this because it would fail in other tables, IIRC.
- 2nd Period Summary, 3rd Period Summary and OT Summary are the same as 1st Period Summary.
- Player Summary is a layout table which contains 2 data tables which are the same:
- 2 rows of column headers.
- First column header is actually a caption for the table and shouldn’t be alongside the other two table headers. Make a variant.
- First row headers span several columns.
- Column headers span a single column.
- Column headers use abbreviations which are not expanded. Make a variant. Can the text content of an
<abbr>
element in a column header be an alias for an abbr
attribute value?
- Row headers use
<td>
.
- Data is very regular with one detail per cell, except player positions which are in the same column as player names. Make a variant.
- Goaltending Summary is a layout table which contains 2 data tables which are the same:
- Column headers use some abbreviations which are not expanded. Make a variant.
- 3 rows in total, 1 row of data. A small number of rows does not indicate a layout table.
- Row header is marked up using
<td>
.
- Very regular with one detail per cell.
- Shots on Goal:
- Caption is embedded into the row of headers. Make a variant.
- Column headers use abbreviations which are not expanded. Make a variant.
- Row headers use
<td>
.
- Very regular with one detail per cell.
- MLB Stats 2007:
-
- The tables nested inside the layout table all follow the same pattern:
- First column has a picture of the player.
- Second column has 3 data items lumped in together:
- Player rank for this category.
- Player name, linked to their player card.
- Abbreviated name of the team they play for.
- The first two columns start with the table caption and don’t have a real column header.
- Rightmost column in each is a column of data with a header.
- In practise, these are also layout tables?
- Can these complex mixtures of data tables, layout tables and hybrid tables be told apart?
- How common are situations like this?
- Is retrofitting accessibility to this even possible? Make a variant.
- Sortables:
- Embedded caption.
- Column headers are needed to disambiguate the link in each data cell.
- Using
<td colspan="2">
instead of <th colspan="2">
. Make a variant. E-mail them about it.
- Two-column layout table:
- First cell in each column uses same markup as genuine table headers elsewhere.
- The key difference is this table contains other tables. That means it cannot be a data table.
- PGA Tour Statistics:
-
- Column headers are repeated after every 10 data rows.
- Row headers are either the number, the player name, or both.
- Candidates for row headers are marked up as plain
<td>
.
- Empty cells use
<td>--</td>
:
- PHP Manual uses 3.
- Other places use 1.
- Yet to a find a place where they are significant.
- So maybe a cell which only contains hyphens is always intended as an empty cell?
- Server-side sorting via a hyperlink in the column header.
- Sorted row is styled like a table using CSS but uses
<td class>
rather than <td><b>
.
- Very regular data with one detail per cell.
- Tiger Woods - Player Card:
-
- PGA Season Overview - 2007:
- Row headers use plain
<td>
.
- 4 rows in total and only 2 are for data. A small number of rows does not indicate a layout table.
- Very regular data with one detail per cell.
- PGA Tour Stat Ranks - 2007:
- First column header spans two columns even though they contain different details. Make a variant which gives the second column a “value” header.
- Row headers use plain
<td>
.
- Very regular data with one detail per cell.
- 2007 Tournaments:
- Most useful row headers are probably the event names, in column 2:
- Make a variant where these use
<th>
.
- Make a variant where column 2 is swapped with column 1.
- Regular data but each cell in column 2 and column 4 contains multiple details.
- Table ends with a full-width row which contains an endnote.
- Indy Racing League Race Schedule:
-
- Borderline layout table.
- Column 2 uses
<td><b>
but the <b>
does not contain everything. It is not intended as a header cell. This trend is consistent with other tables.
- Regular data but column 2 and column 3 have several details in each cell.
- Column 2 has track name and location because they are closely related.
- NHRA Results:
-
- Regular data but cell 3 has loads of details dumped into it:
- Borderline layout table because of this.
- Contains 3 rows of data, each consists of:
- Vehicle class, which should be a header.
- Winning driver’s name, inside
<b>
, which should be a data cell.
- Winning top speed, which should be a data cell.
- Winning time, which should be a data cell.
- Make a variant.
- Is the data packed in this way so it fits in the website’s layout? E-mail them.
- Probably doesn’t need row headers as there are only 3 columns.
Eurosport
- Overall Team Standings: Stage 20:
-
scope
for column headers which are using <th>
. Amazing!
scope
for row headers in middle of row. Seems they think this applies it leftward as well as rightward. I used to think that, too.
- The tabs above the table are links to 5 other tables built the same way.
- Zebra rows using
<tr class>
with a value alternating between row
and alt
.
Soccer
- League Table - Premier League Soccer (UK):
-
- I made a snapshot of the Premier League table on 16th September 2007.
- Entire table is written to the page using Javascript, specifically:
.innerHTML
rather than document.write
.
- Javascript constructs the table markup from an
XMLHttpRequest
.
- There is no table if Javascript is unavailable. You get
alert
boxes if features are unsupported or an error occurs.
- It starts at about line 800, all embedded into the page.
<td>
only with CSS to make the headers bold, just like ESPN.
- Probably the most widely recognised data table in the UK.
Elsewhere
Collections I’ve seen but not worked on:
If you send in a collection I shall add it to this list but I might not work on it.
About this Research
I am Ben ‘Cerbera’ Millard. My aims in doing this are:
- To inform discussion about tables in the HTMLWG.
- To gauge the quality of and trends in existing data tables.
- To provide realistic test cases for the various proposals about table accessibility, such as “table walking” algorithms.
Feedback
Corrections (no matter how small), better translations of the non-English tables to English, links to other collections and so on are welcome. In order of preference:
- Participate in the Data Table Collections (Research) thread of W3C’s
public-html
mailing list. (Recommended.)
- Add to the Accessify Forum topic. (This keeps the work public.)
- Send to my e-mail account,
cerbera@projectcerbera.com
.
Please include both “Table” and “Collection” in any e-mail subject lines to help me track feedback. They can be in any order, with words between them. Plurals are fine.
Method for Retrofitting Simulations
For each table found on the web:
- If it is part of an existing collection:
- Create a subdirectory for this table.
- Otherwise:
- Create a new directory for this new collection.
- Create a subdirectory for this table.
- Create an
original.html
file with the table markup from the original page.
- Create some variants of it, usually these:
minimal.html
:
- Strip the original to the simplest markup without changing cell arrangements. Add
border=1
to make structure visible.
scope.html
:
- Add
scope
attributes to the minimal.html
example, with grouping elements as necessary.
scope-abbr.html
:
- Add
abbr
attributes to the scope.html
example where appropriate.
- Special variants:
-
- Simpler header arrangements.
- Adjescant empty cells as spanned empty cells.
- Translate to English.
- Add
<abbr title>
.
- Non-conformant markup where conformant markup is inadequate.
- Etc.
- Get a feel for conformance and sanity using:
- Upload to the web (duh).
- Update this page if a new collection was created.
Future?
No more original.html
files; they are too big a bottleneck. Dumping links with a summary is more useful for categorising the use cases. It also helps other Participants find things to do.