Re: [whatwg] Sortable Tables

I've added a feature to HTML to enable users (and authors) to sort tables.

The basic design of the feature is that if a column's <th> has a sorted="" 
attribute, the UA will sort the table every time the mutation observers 
would fire (before they fire). A table can have a sortable="" attribute, 
which lets the user tell the user agent to add sorted="" attributes to 
columns to sort them.


On Tue, 6 Nov 2012, Ojan Vafai wrote:
> On Tue, Nov 6, 2012 at 11:25 AM, Ian Hickson <ian@hixie.ch> wrote:
> > On Thu, 1 Jul 2010, Christoph Pper wrote:
> > >
> > > For starters, only rows inside tbodys shall be reordered. For now 
> > > columns dont have to be reordered, ie. only vertical, no horizontal 
> > > sorting.

Done.


> > > Nevertheless the design should make it possible to add the other 
> > > direction later.

Well I guess nothing would stop us supporting sorted="" on <th>s at the 
front of a row, but boy, that would be a lot more complicated to do. You'd 
have to be moving cells around all over the place.


> > > Not every table has content that makes sense to be sorted in a 
> > > different order. So sortable tables should be marked as such. Note 
> > > that col and colgroup elements are hardly supported.

<table sortable>.


> > > Not every column has content that makes sense to be sorted in a 
> > > different order. So non-sortable columns inside sortable tables 
> > > should be marked as such.

Any column with a <th> is sortable, for now. We can add a "nosort" column 
or something later if this becomes a problem.


> > > There are different ways to sort, eg. numeric, temporal or 
> > > alphabetic and ascending or descending. Therefore columns should 
> > > bear information how they should be sorted, ie. what kind of content 
> > > their cells have.

Ascending/descending is supported (sorted="reversed").

Any temporal syntax supported by <time> can be used by putting <time> as 
the only child of the cells to sort.

I intend to spec some sort of algorithm for doing numeric/string 
comparison, but haven't yet come up with a good solution. If you have any 
suggestions, this is the bug tracking this issue:

   https://www.w3.org/Bugs/Public/show_bug.cgi?id=20524


> > > Several columns may be used for sorting by some kind of priority.

You can set sorted="" on multiple columns' headers, and give a sort key 
cardinality in each, as in sorted="1", sorted="2", etc.


> > > The original order must be restorable.

This I have not supported. I don't see how to support it sanely.


> > > Cell content may not consist of the string that should be used 
> > > verbatim for sorting purposes, eg. leading articles or similar 
> > > numbers with different units (g, kg, t ). Cells should have 
> > > an optional attribute indicating their sort key. The time element 
> > > already provides the necessary metadata features for temporal 
> > > sorting  maybe there should be more of such elements instead.

I've used <data> for this, alongside <time>.


> > > There may be columns that shall remain stable, eg. rank numbers.

I haven't supported this. I've no idea how to do this sanely, especially 
given cells with column and row spans.


> 1. Would sorting actually reorder the DOM nodes or just change their 
> visual order? It's not clear to me which one is better. I think the 
> former is what you'd want most of the time.

I've gone with reordering the DOM nodes. Things like :nth-child styling 
become nigh on impossible without doing it at the DOM level, not to 
mention the confusion that would reign from having such a dramatic 
disconnect between rendering and DOM (e.g. with abs pos, etc).


> 2. What values should the sort property allow. One idea is that it takes 
> a JS function similar to what JavaScript's sort function takes. If you 
> leave it out then it just does alphanumeric sort.

I was going to have a comparator function, but I couldn't see a sane way 
to make it work in the face of hostile functions that mutate the DOM, so 
I dropped it. You can do custom sort orders by giving a key in the <data> 
element's value="" attribute, though.


> 3. What elements does it go on? I don't see what it would do on a td. I 
> could see putting it on a th though. Also, it's not clear to me what 
> would get sorted. For example, in some tables, you would group trs 
> inside tbodys and want to sort those.

sorted="" goes on a column-heading <th>, ideally in a <thead> but you can 
also put it on the first row of your <tbody> if you don't have a <thead>. 
Rows are sorted on a per-group basis. Rows that span each other are 
treated as one row for sorting.


On Tue, 6 Nov 2012, Boris Zbarsky wrote:
> 
> Another obvious question: how does (or should) sorting interact with 
> rowspans?

The sort algorithm groups rows that span each other together and treats 
them as one (using the data in their top row for sorting).


On Wed, 7 Nov 2012, Silvia Pfeiffer wrote:
> 
> http://tympanus.net/codrops/2009/10/03/33-javascript-solutions-for-sorting-tables/

Interesting, thanks.


> Also, a sortable table's header needed some indication of the sortability,
> so some default CSS like this:
>     th.sortable {
>       &:after { content: " ▲▼"}
>       &.current{
>         &[data-direction="asc"]:after { content: " ▼"}
>         &[data-direction="desc"]:after { content: " ▲"}
>       }
>     }

I haven't defined the styling in detail, pending both user agent 
implementation experience and the addition of :sorted to CSS.


On Wed, 7 Nov 2012, Silvia Pfeiffer wrote:
> On Wed, Nov 7, 2012 at 8:37 PM, Jirka Kosek <jirka@kosek.cz> wrote:
> >
> > It would be very difficult to support sorting on dates and numbers as 
> > in HTML they are usually present formatted using specific locale. So 
> > there should be additional attribute added to td/th which can hold 
> > sort key which will override cell contents, something like
> >
> > <td sortas="2012-11-07">11. listopadu 2012</td>

   <td><time datetime="2012-11-07">11. listopadu 2012</time>


On Wed, 7 Nov 2012, Stuart Langridge wrote:
>
> I'm the author of http://www.kryogenix.org/code/browser/sorttable/, a 
> moderately popular JavaScript table sorting script. As such, I have 
> about nine years worth of anecdata about how authors want their HTML 
> tables to be sorted, the sorts of things they request, and issues that 
> may be worth taking into consideration. These are not particularly in 
> order; they're just things that I think are relevant.

Thank you very much for your input, it was invaluable.


> Sorttable.js, my script, has the guiding principle of not needing 
> configuration in most cases. Therefore, it attempts to guess the type of 
> a table column: if a column looks like it contains numbers, sorttable 
> will use numeric sort (1 before 2 before 100) rather than alphanumeric 
> sort (1 before 100 before 2); if a column looks like it contains date 
> information, then sorttable will sort by date (for formats DD/MM/YYYY 
> and MM/DD/YYYY). The algorithm used for this guessing is pretty naive 
> (check the first cell in a column; if it's blank, check the next one; 
> etc). I think that this, by itself, has accounted for sorttable's 
> popularity, because in most cases, it Just Works; you add a <script> 
> element pointing to the script, and class="sortable" to the <table>, and 
> do *nothing else*, and your table is sortable without any configuration.

I intend to do something along those lines for HTML's sorting algorithm 
also, though that is still up in the air (see above).


> Everything else below here is configuration-based: something you'd have 
> to do explicitly as an author. The above point is the critical one; 
> guessing column types to make table sorting be zero-config. Some 
> alternative scripts require you to explicitly tag date or numeric 
> columns, and I think that authors see that as annoying. Anecdata, of 
> course.
> 
> Sorttable also allows authors to specify "alternate content" for a cell. 
> That is (ignore the invalid HTML attribute here; I didn't know any 
> better, and we didn't have data-* attributes when I wrote this stuff)
> 
> <td sorttable_customkey="11">eleven</td>

<td><data value="11">eleven</data></td>


> This is basically useful for when you have table data which has a 
> definite order but it can't be autoguessed, or (more usefully still) 
> when it could be autoguessed but that would be hard. The canonical 
> example of this is dates: it would be exceedingly annoying, given 
> <td>Wed 7th November, 10.00am GMT</td> to have to parse that cell 
> content in JavaScript to turn it back into a Date() so it can be placed 
> in sort order with other dates. The sorttable.js solution is to specify 
> a "custom key", which sorttable pretends was the cell content for the 
> purposes of sorting, so <td sorttable_customkey="20121107-100000">Wed 
> 7th November, 10.00am GMT</td> and then the script can sort it.

<td><time datetime="2012-11-07T10:00Z">Wed 7th November, 10.00am GMT</time></td>


> This feature is basically the get-out clause, an author hook for saying 
> "I know what I want, but your fancy sorting thing can't handle it; how 
> do I override that?" They can specify custom keys for all their TDs and 
> then sorting will work fine. (Obviously, dates are less of a problem in 
> theory today with <date> elements, but... how does the script know to 
> use the datetime attribute of the <date> in <td><date>...</date></td>?)

In the case of the spec, if the <td> element's only child is a <time> or a 
<data>, it knows to use the datetime="" or value="" attributes respectively.


> In roughly descending order of popularity, here is what I've been asked 
> questions about, over the last decade or so:
> 
> 1. Sorting tables inserted after page load. This is obviously not a 
> problem (sorting a table created with JS rather than in the base HTML), 
> and sorttable should handle it without explicit action from the author 
> to "mark" a table as sortable, but it doesn't because of laziness from 
> me. I include it for completeness because sorttable not handling it 
> generates probably a third of all the sorttable complaint email I 
> receive; a properly specced sortable tables implementation in browsers 
> would obviously handle this and wouldn't need to even have it specified.

Supported.


> 2. Sorting a table on page load. That is: a table in HTML containing 
> unsorted data should be sorted by the browser when the page loads, 
> without user action. Sorttable doesn't do this because I think it's 
> wrong (if you want sorted data when the page loads, serve it as sorted 
> in the HTML), but lots of people ask for it.

Supported, though I'm not sure how good an idea this will end up being.


> 3. Multiple header rows. Many authors have two or more <tr>s in the 
> <thead>, one of which contains rowspanned <th>s, to group columns 
> together. If this happens, which <th>s are clickable to sort the table? 
> Which are not? This is hard to autodiagnose (and indeed sorttable punts 
> on it and picks the first one, which is almost certainly wrong; even 
> naively picking the last <tr> inside <thead> would be better, but still 
> imperfect).

The spec picks the highest non-spanning <th> in a column, if there's a 
<thead>. (If there's not, it uses the top row's <th>, if it doesn't span 
columns.)


> 4. Handling colspans and rowspans in the table. Sorttable.js basically 
> punts on this, because what's expected to happen when you sort a column 
> which contains only half a cell (because the other half's in another 
> column, with rowspan=2) is wildly author-specific. But a properly 
> specced solution doesn't get to punt and say "unsupported". This will 
> need some thought.

For column spanning, the spec's model basically just acts as if the cell 
isn't spanning, but is in each column it spans.

So e.g. <td colspan=2>X</td> is treated as <td>X</td><td>X</td>, for the 
purposes of sorting.


> 5. Numeric sort handling exponented numbers such as 1.5e6 (which do not 
> match a naive "is this a number" regexp such as /^[0-9]+$/ )

I'd like to support this as part of the algorithm mentioned bofer:

   https://www.w3.org/Bugs/Public/show_bug.cgi?id=20524


> 6. Specifying how to display that a column is sorted. This would likely 
> be done in this specification by leaving it to CSS and 
> th::sorted-forward { after: content("v"); } or some such thing (I have 
> no policy suggestions here), but authors want to be able to specify 
> this, along with different styles for a sorted column. This is mildly 
> more awkward because there's no real concept of a column in the DOM of 
> an HTML table, but perhaps all the TDs could grow a pseudo 
> ::sorted-forward or something (handwaving here like mad, obviously).

I haven't specced this yet but once CSS has the :sorted pseudo (bug 20522) 
I expect we'll be able to do something like:

   th:sorted(ascending)::after { content: "v"; }


> 7. Case sensitivity in alphannumeric sorting. Some people like it, some 
> people don't; it's good to have some sort of author-controllable switch. 
> (Obviously solveable with <td 
> sorttable_customkey="INSENSITIVE">Insensitive</td> in the limit case,

I intend to only support insensitive comparisons initially, but if that's 
a problem we can definitely revisit it somehow. (It can't be worked around 
easily, unlike the other way around.)


> and this, like many other things on this list, suggests that some sort 
> of "here is the JavaScript function I want you to use to produce sort 
> keys for table cells in this column" function is a useful idea. 
> Sorttable allows this, and people use it a lot.)

I tried to do this but couldn't figure out a sane way to do it. A 
comparator can totally destroy the table we're sorting, and I don't know 
what to do if that happens.


> 8. Mark a column as not sortable. Note: this does not mean that clicking 
> on that column doesn't sort it; it means that that column does not get 
> sorted *even when the rest of the table does*. This gets requested for a 
> sort of "left-hand header" concept, where the first column contains 
> numbers, 1, 2, 3, 4 etc, one per row, to show which is row 1, row 2, row 
> 3 etc of the table. Obviously this column should not be sorted when the 
> rest of the table is. I'm not sure there's any good markup for this in 
> HTML (<ol>s do it, but there's no <ol> concept for <tr>s).

I haven't supported this. To some extent, it's presentational, and thus 
can be done using something like:

   tr::before { display: table-cell; content: counter(row); }

...or some such.


> 9. A commonly requested type of things to know how to automatically sort 
> is IP addresses. (I solve this by forwarding people the email explaining 
> how to add a new sort type function to sorttable, because I've never got 
> around to adding it to the script.)

This is something that should end up supported by the sorting algorithm 
automatically.


> 10. Zebra-striped tables are a problem. Well, they're not a problem if 
> you're striping with CSS (#mytable tr:nth-child(2n) td { background: 
> #eee; }) but an awful lot of people bake the stripes into their HTML 
> (<tr class="even">), and this gets screwed up if you sort the table. The 
> solution here obviously might be to poke authors to do presentational 
> stuff with CSS instead and then their problems go away, but *lots* of 
> people complain about this.

:nth-child() is more widely supported than this feature, so I think it 
makes sense to rely on the former if you're relying on the latter.


> 11. Authors like the idea of having script callbacks before and after a 
> user action to sort, so they can do things to the table, show progress 
> or an hourglass, etc. This would presumably be neatly handled by firing 
> a "sort" event on the table or similar.

I've made 'sort' get fired at the table before the sort starts. Nothing is 
fired after currently.


> 12. Stable sort: I recommend that the sort that's implemented be 
> specified as being a stable sort, because people who care really want it 
> and write me annoyed emails that it's not there, and no-one explicitly 
> wants unstable sort. :)

Done.


> 13. What happens if a table has multiple <tbody> elements? Do they sort 
> as independent units, or mingle together? Sorttable just sorts the first 
> one and ignores the rest, because multiple tbodies are uncommon, but 
> that's not really acceptable ;-)

Independent.


> 14. Fixed-position rows. Many authors have a "totals" row at the bottom 
> of their table which should remain at the bottom of the table even after 
> sorting, which is easily handled (that's what <tfoot> is for), but some 
> authors also have rows midway through the table which are "headers": 
> this especially shows up in long tables, where the column headers from 
> <thead> are repeated midway down the table and should remain in position 
> even when the table is sorted. In general this means that they should 
> remain the same number of rows away from <thead>. This case is odd, and 
> sorttable.js doesn't handle it, but lots of people ask for it.

<tfoot> is supported as suggested. Haven't done it for the mid-rows. Not 
sure how to make that work while sorting around them. I mean, you'd have 
to count the number of rows before each one so that you put back the right 
number of rows or something...


On Thu, 8 Nov 2012, Cameron Jones wrote:
> > <time> exists, and <data> exists for non-time machine-readable data; 
> > maybe they can be utilized in some way?
>
> I have done some investigation in this area too and having concrete 
> datatypes would make this more utilizable, ie from the proposal for 
> <data type="" value=""/>
> 
> http://www.w3.org/wiki/User:Cjones/ISSUE-184
> 
> The other area of integration would be with BCP-47 language tags and the 
> CLDR which include i18n collation information, for example british 
> numeric collation:
> 
> en-GB-*u-kn-true*
> 
> The significant benefit with this is that this standard is already 
> universal across server\client and is of course fully internationalized.
> 
> The other aspect of this is that there is a distinction between server 
> pagination including sort ordering defining the content of a page and 
> the client-based sorting which would be more of a presentational 
> customization and outside the scope of pagination. As such, it may be 
> better for the HTML to markup the structure of the content with sorting 
> and collation but for this to be configurable through CSS without the 
> structural DOM changes.
> 
> This could also apply to HTML lists: <ul> <ol>, <dl>.

I haven't added this. I'm curious as to the use cases and how much 
implementation interest there is (I guess this would primarily be for 
validators?).


On Thu, 8 Nov 2012, Alex Russell wrote:
>
> I'm much more inclined to solve this from the data axis. Asking the 
> table itself to do the sorting is weird. Instead, you most often want to 
> have some data source return you rows in sorted order (or indicate row 
> order). If you do something like MDV, sorting the table is applying a 
> sort to the template that stamped out the view. That works with 
> DOM-table backed tables as well as server or JS-backed tables.

I'm happy to strip out the current text in the spec and add in something 
more like this model if there's implementation and author interest, but I 
don't really understand what you are proposing. Can you elaborate?


On Wed, 7 Nov 2012, Christoph Pper wrote:
> 
> >> Note that ‘col’ and ‘colgroup’ elements are hardly supported.
> 
> But they’re essential for assigning sort properties.
> 
>   <col key=…>
>   <colgroup key=…>

I ended up using <th> for this instead.


> To support this, cells must be splittable!
> 
>   td     {color: green;}
>   #split {color: red;}
> 
>   <tr><td>3 <td id=split colspan=2> red
>   <tr><td>1
>   <tr><td>2 <td> green
> 
> after sorting by the first column should look like
> 
>   <tr><td>1 <td id=split> red
>   <tr><td>2 <td> green
>   <tr><td>3 <td id=split> red
> 
> would if duplicate IDs were legal. The DOM tree, however, would not 
> change! The value of the cell at position (1,1), i.e. second row and 
> column since we count from zero, is always undefined, but the value of 
> the slot at (1,1) changes from “red” to “green”.

That's an interesting idea, but I don't think it's the right approach. 
Some elements are not elements you want to clone (e.g. <audio>, <embed>, 
<input>). And it's not clear how you remerge them.


On Fri, 9 Nov 2012, Pierre Dubois wrote:
> 
> My opinion is that depends of the real scope of the "th" element.
> 
> If the "th" is an empty cell or used for "layout", the sorting
> functionality would not be available.
> If the "th" is an "group header", the sorting functionality would be
> applied to the header cell along with their data fixed. Where the
> header cell is a
> subgroup header or/and an header that represent one or more row or column.
> If the "th" is an "header", the sorting functionality could be applied
> to the data cell associated and by default the sorting action would be
> extended to the other axis [row|col].

That's an interesting idea. I'm dubious about overloading the logic like 
this, though, lest it make authors set invalid scope values just to get 
sorting enabled/disabled.

I'd rather just add an attribute that says "this can't be a sort column", 
if that's really a need.

When is it a need, though? I'd love to study a table that has a column 
that it doesn't make sense to sort by.


> Use case: A data table that have row headers and column headers.
> Row and column that is in the scope of an rowspans and colspans data
> cell (td) would be fixed.

Not sure what you mean, but for what it's worth, the spec as written will 
skip over and rows at the top of <tbody>s that consist of only <th>s.


> Use case: A data table that only have row headers.
> Row that is in the scope of an rowspans data cell (td) would be fixed.

If a data table only has row headers, I'm not sure how to sort it.


> Use case: A data table that only have column headers.
> Column that is in the scope of a colspans data cell (td) would be fixed.

Not sure what this means.

-- 
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'

Received on Friday, 28 December 2012 02:04:50 UTC