Fwd: HTML 4.0 Cell Header Recognition Algorithm Challenges

>
>Date: Sun, 14 Nov 1999 01:08:45 -0500
>To: w3c-wai-ua@w3.org
>From: Harvey Bingham <hbingham@ACM.org>
>Subject: Review of Table Techniques

The following material came about from my review of the Augmented WAI
User Agent Techniques for Tables:

     http://www.w3.org/WAI/UA/WAI-USERAGENT-TECHS

I have both embedded comments, and identified a list of limitations on
the existing HTML 4.0 Table Header Recognition Algorithm that deals with
simple data tables. These may be problematic when encountering some large
or more complex tables. At the request of Ian Jacobs and Jon Gunderson,
I forward those comments to Ian, the www-html-editor, for logging into
the review base for the next version of HTML/?XHTML? .

The basis is the Working Draft User Agent Techniques 1999-10-29, and
extract from the HTML 4.0 documentation where I copy that algorithm
for comparison.

Meta notation:

_insert_
XdeleteX
HB: ... new material

3.3 Table techniques

Tables were designed to structure relationships among data. In graphical 
media, tables are often rendered on a two-dimensional grid, but this is 
just one possible interpretation of the data. On the Web, the HTML TABLE 
element has been used more often than not to achieve a formatting effect 
("layout tables") rather than as a way to structure true tabular data 
("data tables").
Layout tables cause problems for some screen readers and when rendered, 
confuse users. Even data tables can be difficult to understand for users 
that browse in essentially one dimension, i.e. for whom tables are rendered 
serially. The content of any table cell that visually wraps onto more than 
one line can be a problem. If only one cell has content that wraps, there 
is less problem if it is in the last column. Large tables pose particular 
problems since remembering cell position and header information becomes 
more difficult as the table grows.

User agents facilitate browsing by providing access to specific table cells 
and their associated header information. How headers are associated with 
table cells is markup language-dependent.

Tabular navigation is required by people with visual impairments and some 
types of learning disabilities to determine the content of a particular 
cell and spatial relationships between cells (which may convey 
information). If table navigation is not available users with some types of 
visual impairments and learning disabilities may not be able to understand 
the purpose of a table or table cell.


3.3.1 Table rendering

A linear view of tables -- cells presented row by row or column by column 
-- can be useful, but generally only for simple tables. Where more complex 
structures are designed, allowing for the reading of a whole column from 
header downward is important as is carrying the ability to perceive which 
header belongs to which column or group of columns if more than one is 
spanned by that header. It is important for whole cells to be made 
available as chunks of data in a logical form. It might be that a header 
spans several cells so the header associated with that cell is part of the 
document chunk for that and each of the other cells spanned by that header. 
Inside the cell, order is important. It must be possible to understand what 
the relationships of the items in a cell are to each other.

Properly constructed data tables generally have distinct TH head cells and 
TD data cells. The TD cell content gains implicit identification from TH 
cells in the same column and/or row.

For layout tables, a user agent can assist the reader by indicating that no 
relationships among cells should be expected. Authors should not use TH 
cells just for their formatting purpose in layout tables, as those TH cells 
imply that some TD cells should gain meaning from the TH cell content.

When a table is "read" from the screen, the contents of multi-line cells 
may become intermingled. For example, consider the following table:

This is the top left cell    This is the top right cell
of the table.                of the table.
This is the bottom left      This is the bottom right
cell of the table.           cell of the table.

If read directly from the screen, this table might be rendered as "This is 
the top left cell This is the top right cell", which would be confusing to 
the user.

A user agent should provide a means of determining the contents of cells as 
discrete from neighboring cells, regardless of the size and formatting of 
the cells. This information is made available through the DOM [DOM1]).

HB: _Note that the table representation in the DOM is hierarchic: rows 
contain cells. Information about a TD cell needs to find corresponding 
information not only by row, but by column, so special means to find 
corresponding positions in different rows is important._


3.3.2 Cell rendering

Non-graphical rendering of information by a browser or an assistive 
technology working through a browser will generally not render more than a 
single cell, or a few adjacent cells at a time. Because of this, the 
location of a cell of interest within a large table may be difficult to 
determine for the users of non-graphical rendering.

In order to provide equivalent access to these users, compliant browsers 
should provide a means of determining the row and column coordinates of the 
cell having the selection via keyboard commands. Additionally, to allow the 
user of a non-graphical rendering technology to return to a cell, the 
browser should allow a means of moving the selection to a cell based on its 
row and column coordinates.

At the time the user enters a table, or while the selection is located 
within a table, the user agent should allow an assistive technology to 
provide information to the user regarding the dimensions (in rows and 
columns) of the table. This information, in combination with the summary, 
title, and caption, can allow the user with a disability to quickly decide 
whether to explore the table of skip over it.

Dimensions is an appropriate term, though dimensions needn't be constants. 
For example a table description could read: "4 columns for 4 rows with 2 
header rows. In those 2 header rows the first two columns have "colspan=2". 
The last two columns have a common header and two subheads. The first 
column, after the first two rows, contains the row headers.

Some parts of a table may have 2 dimensions, others three, others four, 
etc. Dimensionality higher than 2 are projected onto 2 in a table 
presentation.
The contents of a cell in a data table are generally only comprehensible in 
context (i.e., with associated header information, row/column position, 
neighboring cell information etc.). User agents provide users with header 
information and other contextual information. Techniques for rendering 
cells include:

Provide this information through an API.

Render cells as blocks. This may assist some screen readers. Using this 
strategy, the user agent might render individual cells with the relevant 
top and side headers attached.

Allow navigation and querying of cell/header information. When the 
selection is on an individual cell, the user would be able to use a 
keyboard command to receive the top and left header information for that 
cell. The user agent should appropriately account for headers that span 
multiple cells.

Allow users to read one table column or row at a time, which may help them 
identify headers.

Ignore table markup entirely. This may assist some screen readers. However, 
for anything more than simple tables, this technique may lead to confusion.


3.3.3 Cell header algorithm

User agents should use the algorithm to calculate header information 
provided in the HTML 4.0 specification ([HTML40], section 11.4.3. (This 
algorithm assumes left-to-right writing direction). Here it is slightly 
amplified,
shown in _..._ :

·       First, search left from the _TD _cell's position to find row header 
cells. Then search upwards _from the TD cell's position_ to find column 
header cells. The search in a given direction stops when the edge of the 
table is reached or when a data cell is found after a header cell.

·       Row headers are inserted into the list in the order they appear in 
the table. For left-to-right tables, headers are inserted from left to 
right, _including values implicitly resulting from TH cells in prior rows 
with ROWSPAN="R", sufficient to extend into the current row_.

·       Column headers are inserted after row headers, in the order they 
appear in the table, from top to bottom_, including values implicitly 
resulting from TH cells in other columns with COLSPAN="C", sufficient to 
extend into the current column containing the TD cell_.

·       If a header cell has the headers attribute set, then the headers 
referenced by this attribute are inserted into the list and the search 
stops for the current direction.

·       Cells that set the axis attribute are also treated as header cells.

HB: Note several cases where that algorithm isn't explicit:

·       Propagating the TH identifying information to the subsequent R-1 
rows below a row where some TH cells have rowspan="R" where R > 1.

·       Propagating the TH identifying information to the subsequent C-1 
columns for TH cells that have colspan="C" where C > 1.

·       It fails to recognize columns at the right, or rows at the bottom 
containing TH cells.

·       It doesn't say to use any value of the abbr="abbreviation" 
attribute on TH cells in place of that cell content itself.

·       It doesn't address what implicit information from TH applies to a 
TD that itself contains ROWSPAN="R" or COLSPAN="C" for R > 1 or C > 1.

·       It ignores cases where the effective number of TH cells difers 
among the TD cells, particularly one where either ROWSPAN or COLSPAN values 
are greater than one.

·       It ignores the direction (row or column) that an embedded TH in the 
middle of TD is to stop search for other TH.

·       It ignores the table attribute dir="rtl" whose effect is to reverse 
the order of columns in a row.

HB: end incomplete algorithm issues

HB: In a table with leading row and column of TH cells, the interpretation 
of the top-left cell as an empty TD or TH should not contribute to the set 
of headings that are used to identify the implicit information for TD cells 
in a table.

Since not all tables are designed with the header information, user agents 
should provide, as an option, a "best guess" of the header information for 
a cell. Note that data tables may be organized top-to-bottom, 
bottom-to-top, right-to-left, and left-to-right, so user agents should 
consider all edge rows when seeking header information.

Some repair strategies for finding header information include:

·       Consider that the top or bottom row Xto Xcontains header information.

·       Consider that the leftmost or rightmost column _or both_ in a 
column group contains header information.

·       If cells in an edge row or column span more than one row or column, 
consider the following row or column to contain header information as well.

HB: The user may choose the form and amount of this information, possibly 
announcing the row heads only once and then the column head or its 
abbreviation ("abbr") to announce the cell content, when reading across
rows.

HB: The user may wish to read down a column. Then that announcement form
should be reversed (TH content or abbr value(s) from COL(s), and then those
in row). With colspan value>1, several rows of TH can apply to fully describe
the TH information. In addition, the COLGROUP of COL(s) may provide 
additional structuring. There A user may wish to be alerted of that 
grouping, or ignore it.

HB: A COLGROUP may be used to group a set of COLs. If the author includes
any COLGROUP, presumably the author believes that (at least one) of the
COLGROUP grouping of COLs makes a useful distinction.

HB: If any COLGROUP occurs in the HTML 4.0 Table Model, all COL must be in
a COLGROUP:

     <!ELEMENT TABLE (CAPTION?, (COLGROUP*|COL*), THEAD?, TFOOT?, TBODY+)>

HB: I consider that is a bug: I'd rewrite that to allow an intermixture:

     <!ELEMENT TABLE (CAPTION?, (COLGROUP|COL)*, THEAD?, TFOOT?, TBODY+)>

HB: There is no explicit text content of a COLGROUP to give reason for
such a container for COL(s). So it is difficult for the author's intention
to make distinct the COLGROUP content. One way is to use a different visual
presentation of the enclosing group (such as wider inter-column rulings),
or possibly a user preference to have read or displayed the COLGROUP
attribute title="...", possibly including the number of columns involved,
or alternatively announce each COL in order as subsidiary to the COLGROUP.

HB: For non-visual presentation, a different announcer may be used for
entry and exit of a COLGROUP. The current model presumes that every
end COLGROUP except the last is followed by another COLGROUP. So a combined 
announcer could be used: one to start a COLGROUP, a second for the combined
close/open, and a different one to end the last.

HB: In the proposed model, distinct COLGROUP entry and exit announcers
would be required, as not all COL would need to be within a COLGROUP.

HB: With my suggested change to the content model, when some groups are so 
distinguished, a COL not in such a group would not necessarily
be evident.

HB: The user likely needs to know the title information for all COL which 
comprises that COLGROUP. [COLGROUP has only element content (COL)*, it
has no explicit text content.] An audible distinction when entering/leaving a
COLGROUP may help.

Regards/Harvey Bingham
WAI-UA working group

Received on Wednesday, 17 November 1999 16:26:00 UTC