[whatwg] asynchronous data providers from Alex Russell on 2008-12-31 (public-whatwg-archive@w3.org from December 2008)

From: Alex Russell <slightlyoff@google.com>
Date: Wed, 31 Dec 2008 01:57:47 -0800
Message-ID: <6fc58d0d0812310157l41bd606eje08e59c3408e231e@mail.gmail.com>

Hello,

As per a discussion with Ian on IRC, several issues jumped out at me
when looking over the proposed data provider APIs for the <datagrid>
tag (DataGridDataProvider).:

  * most of the APIs for providing data are synchronous, implying that
the entire data set be local or that systems that want to do something
smarter must attempt to block (synchronous XHR, e.g.). In the case of
some forms of network request, this may not even be possible (e.g.,
JSON-P requests for x-domain data). Either assumption (local data or
blocking network I/O) poses a challenge to efficiently handling very
large data sets.
  * the data provider does not issue requests for rows as a block.
Instead, it passes an individual rowspec to each call of getCellData.
This makes it difficult for smart providers to bundle requests for
data in a particular range (assuming network I/O).
  * functions seem to be called to provide the results of editing for
a particular data item (editCell(...)), but no event is thrown on the
grid to implement custom value editors and it's not clear how to plug
into the grid to inform it that editing has finished.
  * the data provider API expects a "real" answer about how many
children a row may have (getRowCount(row)), but in the case of a
deeply nested tree and a lazy-loading data provider, this information
isn't likley to be available up-front.

These concerns stem from real-world experience with the Dojo Grid
component and the abstract data store system (dojo.data) that backs it
and allows it to handle tens of thousands of rows efficiently.

The design of that system was adapted to these needs by stipulating that:

 * data providers must always inform grids of how many rows they will
show *in total* for a particular query, even if they only return a
fraction of those rows at a time.
 * access to rows be in the form of ranges (start offset and count)
inside the # of possible returned items at any level.
 * to make programming to the system sane, property access (cell value
fetching from a particular row) is synchronous
 * all other operations are asynchronous, based on the Deferred class
found in Twisted Python, MochiKit, and Dojo. Such a promise to return
data later makes programming to asyncronous systems somewhat easier.

Regards

Received on Wednesday, 31 December 2008 01:57:47 UTC