Re: [whatwg] Sortable Tables

I'm the author of http://www.kryogenix.org/code/browser/sorttable/, a
moderately popular JavaScript table sorting script. As such, I have about
nine years worth of anecdata about how authors want their HTML tables to be
sorted, the sorts of things they request, and issues that may be worth
taking into consideration. These are not particularly in order; they're
just things that I think are relevant.

Sorttable.js, my script, has the guiding principle of not needing
configuration in most cases. Therefore, it attempts to guess the type of a
table column: if a column looks like it contains numbers, sorttable will
use numeric sort (1 before 2 before 100) rather than alphanumeric sort (1
before 100 before 2); if a column looks like it contains date information,
then sorttable will sort by date (for formats DD/MM/YYYY and MM/DD/YYYY).
The algorithm used for this guessing is pretty naive (check the first cell
in a column; if it's blank, check the next one; etc). I think that this, by
itself, has accounted for sorttable's popularity, because in most cases, it
Just Works; you add a <script> element pointing to the script, and
class="sortable" to the <table>, and do *nothing else*, and your table is
sortable without any configuration.

Everything else below here is configuration-based: something you'd have to
do explicitly as an author. The above point is the critical one; guessing
column types to make table sorting be zero-config. Some alternative scripts
require you to explicitly tag date or numeric columns, and I think that
authors see that as annoying. Anecdata, of course.

Sorttable also allows authors to specify "alternate content" for a cell.
That is (ignore the invalid HTML attribute here; I didn't know any better,
and we didn't have data-* attributes when I wrote this stuff)

<td sorttable_customkey="11">eleven</td>

This is basically useful for when you have table data which has a definite
order but it can't be autoguessed, or (more usefully still) when it could
be autoguessed but that would be hard. The canonical example of this is
dates: it would be exceedingly annoying, given
<td>Wed 7th November, 10.00am GMT</td>
to have to parse that cell content in JavaScript to turn it back into a
Date() so it can be placed in sort order with other dates. The sorttable.js
solution is to specify a "custom key", which sorttable pretends was the
cell content for the purposes of sorting, so
<td sorttable_customkey="20121107-100000">Wed 7th November, 10.00am GMT</td>
and then the script can sort it. This feature is basically the get-out
clause, an author hook for saying "I know what I want, but your fancy
sorting thing can't handle it; how do I override that?" They can specify
custom keys for all their TDs and then sorting will work fine. (Obviously,
dates are less of a problem in theory today with <date> elements, but...
how does the script know to use the datetime attribute of the <date> in
<td><date>...</date></td>?)

In roughly descending order of popularity, here is what I've been asked
questions about, over the last decade or so:

1. Sorting tables inserted after page load. This is obviously not a problem
(sorting a table created with JS rather than in the base HTML), and
sorttable should handle it without explicit action from the author to
"mark" a table as sortable, but it doesn't because of laziness from me. I
include it for completeness because sorttable not handling it generates
probably a third of all the sorttable complaint email I receive; a properly
specced sortable tables implementation in browsers would obviously handle
this and wouldn't need to even have it specified.
2. Sorting a table on page load. That is: a table in HTML containing
unsorted data should be sorted by the browser when the page loads, without
user action. Sorttable doesn't do this because I think it's wrong (if you
want sorted data when the page loads, serve it as sorted in the HTML), but
lots of people ask for it.
3. Multiple header rows. Many authors have two or more <tr>s in the
<thead>, one of which contains rowspanned <th>s, to group columns together.
If this happens, which <th>s are clickable to sort the table? Which are
not? This is hard to autodiagnose (and indeed sorttable punts on it and
picks the first one, which is almost certainly wrong; even naively picking
the last <tr> inside <thead> would be better, but still imperfect).
4. Handling colspans and rowspans in the table. Sorttable.js basically
punts on this, because what's expected to happen when you sort a column
which contains only half a cell (because the other half's in another
column, with rowspan=2) is wildly author-specific. But a properly specced
solution doesn't get to punt and say "unsupported". This will need some
thought.
5. Numeric sort handling exponented numbers such as 1.5e6 (which do not
match a naive "is this a number" regexp such as /^[0-9]+$/ )
6. Specifying how to display that a column is sorted. This would likely be
done in this specification by leaving it to CSS and th::sorted-forward {
after: content("v"); } or some such thing (I have no policy suggestions
here), but authors want to be able to specify this, along with different
styles for a sorted column. This is mildly more awkward because there's no
real concept of a column in the DOM of an HTML table, but perhaps all the
TDs could grow a pseudo ::sorted-forward or something (handwaving here like
mad, obviously).
7. Case sensitivity in alphannumeric sorting. Some people like it, some
people don't; it's good to have some sort of author-controllable switch.
(Obviously solveable with <td
sorttable_customkey="INSENSITIVE">Insensitive</td> in the limit case, and
this, like many other things on this list, suggests that some sort of "here
is the JavaScript function I want you to use to produce sort keys for table
cells in this column" function is a useful idea. Sorttable allows this, and
people use it a lot.)
8. Mark a column as not sortable. Note: this does not mean that clicking on
that column doesn't sort it; it means that that column does not get sorted
*even when the rest of the table does*. This gets requested for a sort of
"left-hand header" concept, where the first column contains numbers, 1, 2,
3, 4 etc, one per row, to show which is row 1, row 2, row 3 etc of the
table. Obviously this column should not be sorted when the rest of the
table is. I'm not sure there's any good markup for this in HTML (<ol>s do
it, but there's no <ol> concept for <tr>s).
9. A commonly requested type of things to know how to automatically sort is
IP addresses. (I solve this by forwarding people the email explaining how
to add a new sort type function to sorttable, because I've never got around
to adding it to the script.)
10. Zebra-striped tables are a problem. Well, they're not a problem if
you're striping with CSS (#mytable tr:nth-child(2n) td { background: #eee;
}) but an awful lot of people bake the stripes into their HTML (<tr
class="even">), and this gets screwed up if you sort the table. The
solution here obviously might be to poke authors to do presentational stuff
with CSS instead and then their problems go away, but *lots* of people
complain about this.
11. Authors like the idea of having script callbacks before and after a
user action to sort, so they can do things to the table, show progress or
an hourglass, etc. This would presumably be neatly handled by firing a
"sort" event on the table or similar.
12. Stable sort: I recommend that the sort that's implemented be specified
as being a stable sort, because people who care really want it and write me
annoyed emails that it's not there, and no-one explicitly wants unstable
sort. :)
13. What happens if a table has multiple <tbody> elements? Do they sort as
independent units, or mingle together? Sorttable just sorts the first one
and ignores the rest, because multiple tbodies are uncommon, but that's not
really acceptable ;-)
14. Fixed-position rows. Many authors have a "totals" row at the bottom of
their table which should remain at the bottom of the table even after
sorting, which is easily handled (that's what <tfoot> is for), but some
authors also have rows midway through the table which are "headers": this
especially shows up in long tables, where the column headers from <thead>
are repeated midway down the table and should remain in position even when
the table is sorted. In general this means that they should remain the same
number of rows away from <thead>. This case is odd, and sorttable.js
doesn't handle it, but lots of people ask for it.

I hope that's useful. Happy to answer questions.

sil

-- 
New Year's Day --
everything is in blossom!
I feel about average.
   -- Kobayashi Issa

Received on Wednesday, 7 November 2012 10:35:48 UTC