How to make complex data tables more accessible to screen-reader users

(Murray asked me to start a new thread about this today, outlining my 
thoughts. Hopefully this will help.)

HTML has a feature that allows multidimensional data to be marked up and 
presented in a primarily two-dimensional fashion, namely the <table> 
element. This feature also has a few features to express more complex 
data, such as <th> vs <td>, headers="", scope="", <thead>/<tbody>/<tfoot>, 
and colspan=""/rowspan="".

Users of screen readers are able to navigate straight-forward 
two-dimensional tables reasonably, easily; screen readers have developed a 
set of navigation features that allows users to quickly skim cells 
horizontally and vertically and also enables users to easily determine 
their current position. A simple table with a series of data cells with 
the top row and left column containing headers can therefore be read 
relatively simply by screen-reader users, by skimming the first row to get 
an idea of the fields in the data, skimming the first column to get an 
idea of the various options that the table covers, and then walking 
through to the relevant cells to get whatever information is desired, 
potentially walking a series of cells in a row or column to get 
information relating to the range of the data.

Users of visual user agents [1] interact with such tables in a remarkably 
similar way, first reading the headers in the first row of the table, then 
reading the headers of the rows, and then using this information to pin 
down the cell or series of cells in which they are interested. However, it 
is typically a much more instinctive behaviour than the more belaboured 
and interactive experience of a screen-reader user.

([1] For the purpose of this discussion, I shall consider screen-reader/ 
browser combinations as being non-visual user agents, even though in they 
are actually strictly speaking visual user agents also.)

In addition, screen readers would be most helpful to their users if they 
could programmatically summarise table structures automatically. Indeed, 
many do report basic table information such as the number of rows and 
columns; going forward, it seems likely that this can and should be 
improved to describe basic table types, so that even simpler tables or 
tables that might lack necessary descriptive text can be explained.

However, things get more difficult with complicated tables such as some of 
the ones studied by Ben a few years ago. [2][3]

[2] http://projectcerbera.com/web/study/2007/tables
[3] http://projectcerbera.com/web/study/2008/tables

For these, users -- both users of visual user agents and users of screen 
readers -- would benefit greatly from some human-written explanatory or 
introductory text. Screen reader users are especially in need of such 
text, since they cannot see the patterns that visual user might see.

Explanatory text could be put in several places:

 - Before the table in the prose:

     <p>...</p>
     <table>...</table>

 - After the table in the prose:

     <table>...</table>
     <p>...</p>

In the two cases above, ARIA attributes could be used to more tightly 
couple the two to enable screen readers to provide a link between them.

 - As part of a <figure> with the table:

     <figure>
      <p>...</p>
      <table>...</table>
     </figure>

 - As part of a caption:

     <table>
      <caption>
       ...
       <p>...</p>
      </caption>
      ...
     </table>

All of the examples above are about equivalent; different authors might 
prefer different options in different cases. (The spec encourages the 
fourth, with the caption, because it links the explanatory text to the 
table in a clear way for screen readers, has the preferred behaviour in 
existing screen-readers, and doesn't require the use of a separate 
<figure> element, which is not always desireable.)

 - Introducing a new element around <table>, e.g.:

     <table>
      <summary> ... </summary>
      ...
     </table>

Unfortunately there are parsing issues with this.

 - Introducing a new element inside <caption>, e.g.:

     <table>
      <caption>
       ...
       <summary>...</summary>
      </caption>
      ...
     </table>

 - Introducing a new element inside <figure>, e.g.:

     <figure>
      <summary>...</summary>
      <table>...</table>
     </figure>

This would make sense if the summary content was rendered very differently 
than other content in specific media, but in practice in ATs the summary 
content is just read out like caption content, so it wouldn't add much 
here, and in other UAs the author would be able to just style it using 
CSS. (Media queries can also be used to hide content specifically from 
particular media, e.g. having text not appear on screen.)

 - Reusing <details>:

     <table>
      <caption>
       ...
       <details>
        <legend> Help... </legend>
        ...
       </details>
      </caption>
      ...
     </table>

This, rather while complicated, and thus not likely to be widely used by 
authors (especially not used correctly by authors) if we were to suggest 
it as the primary mechanism, is still reasonable, and the spec does allow 
this, so it could be used if desired.

 - Using the summary="" attribute from HTML4:

     <table summary="...">
      ...
     </table>

This last option has a number of drawbacks. It only allows simple, 
un-marked-up text; it isn't visible to non-screen-reader users in legacy 
user agents; and visual media browsers would not want to show this content 
inline in legacy content because it would cause legacy content to change 
rendering in a non-backwards-compatible manner. I'm skeptical that this 
is an effective way to actually solve the problem.

Naturally, supporting legacy content that already uses the summary="" 
attribute should not be prevented; to this end, HTML5 in fact encourages 
user agents (such as screen readers) to expose the contents of summary="" 
attributes, even though the attribute isn't part of the language.


US goverment advice on how to include explanatory text suggests using the 
<caption> or putting content adjacent to the table, as in the first four 
solutions above:

| [...] web developers who are interested in summarizing their tables 
| should consider placing their descriptions either adjacent to their 
| tables or in the body of the table, using such tags as the CAPTION tag.
 -- http://www.access-board.gov/sec508/guide/1194.22.htm#(g)


Some have argued that the summary="" attribute is a better solution to the 
problem described above than the other solutions suggested above.

Here is some empirical data that suggests otherwise.

   http://www.youtube.com/watch?v=xMGBX8gAM6g#t=0m30s

   Usability study. A blind user, using JAWS, upon being introduced to
   a sample table with the summary="" attribute, says, unprompted:
   "Now it gave a little summary information there. And I'm wondering,
   how necessary is that. [...] I'm thinking it's too much. [...] I
   think you'll find that information yourself anyway by just
   exploring the table." He then goes on to say that other people
   might disagree, but adds "but for me, they're annoying". He also
   notes that he believes he has the feature disabled in his
   installation, though this contradicts statements by Steven saying
   that summaries aren't disablable in Jaws. [4]

   [4] http://lists.w3.org/Archives/Public/public-html/2009Jun/0282.html


   http://www.paciellogroup.com/blog/misc/summary.html

   A manual crawl of government pages with a summary="". I went
   through this in detail in a contemporary e-mail [5], and
   controversially concluded that "summary="" hurts users who don't
   have access to it, hiding information that they could use, hurts
   users who DO have access to it, encouraging people to consider
   layout tables acceptable; and hurts the authors writing these
   tables, wasting their time writing summaries when their time would
   be better spent making pages accessible to _everyone_". Leif
   questioned some of my comments [6], but I believe my conclusion
   stands up to his close scrutiny.

   [5] http://lists.w3.org/Archives/Public/public-html/2009Feb/0601.html
   [6] http://lists.w3.org/Archives/Public/public-html/2009Jun/0285.html


   http://canvex.lazyilluminati.com/misc/summary.html
   http://canvex.lazyilluminati.com/misc/summary-20090226.html
   http://philip.html5.org/data/table-summary-values-dotbot.html

   Automated crawls through two different corpuses. These show actual
   values of summary="", unfiltered for layout tables. Simon went
   through the last (and biggest) list one at a time, and reported
   finding only one page (out of 425,000) with a summary="" value that
   actually fit the recommended guidelines, and pointed out that for
   that table, the summary was in fact redundant and didn't help
   accessibility. [7]

   Of the other values, almost all are outright bogus ("pid991460"),
   but some have values that appear to be well-meaning but of
   questionable practical use, such as "Calendar".

   [7] http://lists.w3.org/Archives/Public/public-html/2009Jun/0698.html


I've previously gone through this data in more detail, e.g. in:

   http://lists.w3.org/Archives/Public/public-html/2008Dec/0175.html
   http://lists.w3.org/Archives/Public/public-html/2009Feb/0601.html
   http://lists.w3.org/Archives/Public/public-html/2009Feb/0690.html
   http://lists.w3.org/Archives/Public/public-html/2009Feb/0735.html
   http://lists.w3.org/Archives/Public/public-html/2009Jun/0173.html

Overall I think the data pretty clearly speaks to the problems that 
summary="" have today. After ten years of evangelisation and education 
efforts, authors *who intend to help users with accessibility needs* still 
do not use the attribute in a useful manner. That these well- meaning 
authors so fundamentally don't understand how to make table explanations 
useful IMHO is an indication that we need to change how we are going about 
the problem. This is why I suggest telling them to include explanatory 
text in an immediately visible manner. This would force them to see the 
text even if they do only the most primitive of QA (as apparently many 
do). If the authors see the text, then they are more likely to make it 
sensible. This would then help the users they want to help, and the users 
for which we want to make the Web a better place.


I think that if we are to find a new solution (other than those listed 
above), or if we are to decide to use summary="" despite the flaws 
described above, we need more information.

Specifically, to support summary="" I think the following would be useful:

 * Data showing whether screen reader users actually use summary="" 
   attributes in their day-to-day life. Usability studies are the most 
   reliable and effective way to find this out. (Note that asking users is 
   not a good way to find this kind of information out. Users are 
   notoriously incapable of accurately describing their behaviour.)

 * Data showing whether the values that are seen by users are actually 
   useful or not on the aggregate (it has been argued that this is 
   different than the values that are seen on the Web e.g. as in the 
   data cited above, because ATs apparently filter that data). A 
   random crawl that applies the same filter as the ATs is probably the 
   method that would get us the most data for this, but it may be 
   impractical depending on what filter the ATs use. Examining a small set 
   of URLs manually with an AT based on a previous crawl to find potential 
   candidate pages randomly may be more practical.

 * If the values that appear in the data collected for the previous bullet 
   point include some of the more questionable values, rather than only 
   unambiguously good values, then an explanation of why such values are 
   useful, or even better, data showing that such values are indeed 
   useful, e.g. from a usability study looking at such pages specifically.

To support <summary>, the following would be useful:

 * Data showing that certain tools, user agents, authors, or users treat 
   explanatory text about tables in a substantially different way than 
   caption text or surronding prose.

In the absence of this data, I don't think we have enough grounds to 
continue supporting summary="" or to introduce a new element. Clearly, 
others disagree.


I feel I must point out that we have used the exact same data-driven 
process for every single feature in HTML5. In some cases, we don't have 
much data to go on; in others, we have a lot. But we have used the same 
methodology for every feature in the language. This is no exception.


I would welcome input from the chairs regarding how to resolve this issue. 
Personally I don't think this is a difficult issue; it seems that there is 
a clearly technically inferior solution being proposed (summary="") that 
has been demonstrated to not actually solve the problem described at the 
top of this e-mail. So to me, it seems that if we are basing HTML5's 
development on purely technical grounds and arguments, and not listening 
to the volume of the discourse, that the way forward is clear; we should 
adopt one or more of the solutions proposed that do not suffer from the 
same design problems as the summary="" attribute.

If the chairs disagree, and believe that this is a non-technical issue, or 
believe that technical issues should be resovled by vote, then I would 
recommend having something like the following options:

 ( ) I support the design of the HTML4 working group.
     (Including the summary="" attribute on tables.)

 ( ) I support the design currently in Ian's HTML5 proposal.
     (Suggesting that tables should be described in captions.)

 ( ) I support the design currently in Rob's HTML5 proposal.
     (Allowing summary="", but saying it doesn't work.)

 ( ) I have another proposal. Describe it below.


Cheers,
-- 
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'

Received on Sunday, 5 July 2009 11:14:59 UTC