Re: [IndexDB] Proposal for async API changes from Jeremy Orlow on 2010-05-18 (public-webapps@w3.org from April to June 2010)

From: Jeremy Orlow <jorlow@chromium.org>
Date: Tue, 18 May 2010 21:23:51 +0100
To: Jonas Sicking <jonas@sicking.cc>
Cc: Webapps WG <public-webapps@w3.org>
Message-ID: <AANLkTikM4JooqYTinn_NRY-7K0NG9iTwKPlC0josGLkS@mail.gmail.com>
On Tue, May 18, 2010 at 8:18 PM, Jonas Sicking <jonas@sicking.cc> wrote:

> On Tue, May 18, 2010 at 7:20 AM, Jeremy Orlow <jorlow@chromium.org> wrote:
> >> 10. You are allowed to have multiple transactions per database
> >> connection. However if they use overlapping tables, only the first one
> >> will receive events until it is finished (with the usual exceptions of
> >> allowing multiple readers of the same table).
> >
> > Can you please clarify what you mean here?  This seems like simply an
> > implementation detail to me, so maybe I'm missing something?
>
> The spec currently does explicitly forbid having multiple transactions
> per database connection. The syntax doesn't even support it since
> there is a .currentTransaction property on IDBDatabase. I.e. the
> following code seems forbidden (though I'm not sure if forbidden means
> that an exception will be thrown somewhere, or if it means that the
> code will just silently fail to work).
>
> request = indexedDB.openDatabase("myDB", "...");
> request.onsuccess = function() {
>  db = request.result;
>  r1 = db.openTransaction(["foo"]);
>  r1.onsuccess = function() { ... };
>  r2 = db.openTransaction(["bar"]);
>  r2.onsuccess = function() { ... };
> };
>
> The spec says that the above is forbidden. In our new proposal the
> following would be allowed:
>
> request = indexedDB.openDatabase("myDB", "...");
> request.onsuccess = function() {
>  db = request.result;
>  t1 = db.transaction(["foo"], READ_WRITE);
>  t2 = db.transaction(["bar"], READ_WRITE);
> };
>
> And would allow the two transactions to run concurrently.
>

Oh I see.  You threw me off because you were only talking in terms of static
transactions.  That said, I think things are still fine in the context of
dynamic transactions.  We'll just run multiple concurrently and abort them
if they end up being unserializable.

> 4)  IDBTransactionEvent includes a db parameter that would probably be
> > useful for all the events.  I'd suggest moving it to IDBEvent and simply
> > making it be null for the case of IndexedDatabaseRequest.open() calling
> the
> > onerror handler.
>
> I don't see such a parameter in IDBTransactionEvent?


Oops, I meant to say *attribute* on *IDBTransactionRequest*.


> I'm not sure what
> it would be useful for since likely everything you want to do is
> available on the transaction object.
>

Now that I look at it, the only methods that use IDBSuccessEvent (vs.
TransactionEvent) are IndexedDatabaseRequest.open and
IDBDatabaseRequest.removeObjectStore.  The former definitiely makese sense.
 And the latter seems reasonable since needing to support removal of object
stores in transactions seems esoteric and possibly difficult to implement.
 So there really is no need to move it.

> 8)   We can't leave deciding whether a cursor is pre-loaded up to UAs
> > since people will code for their favorite UA and then access
> > IDBCursorPreloadedRequest.count when some other UA does it as a
> > non-preloaded request.  Even in the same UA this will cause problems when
> > users have different datasets than developers are testing with.
>
> As Shawn said, the type returned is decided based on the 'sync'
> argument to openCursor. So no UA dependent behavior here.
>
> > Overall, I like the direction you're taking IDBCursor, but I'd like to
> offer
> > a counter-proposal:
> > interface IBDCursorRequest : IDBCursor {
> >   readonly attribute any key;
> >
> >   readonly attribute any value;
> >   readonly attribute unsigned long long estimatedCount;
> >
> >   // Returns null if the cursor was updated synchronously.  Otherwise
> >   // will return an IDBRequest object whose result will be set to this
> >   // cursor on success.  Until onsuccess is called, any access of key
> >   // or value will raise an exception.  The first request MUST cause
> >   // an asynchronous request but the behavior of subsequent calls is
> >   // up to the UA.
> >   IDBRequest continue(in optional any key /* null */);
> >   // Success fires IDBTransactionEvent, result == key
> >   IDBRequest update(in any value);
> >
> >   // Success fires IDBTransactionEvent, result == null
> >   IDBRequest remove();
> > };
> > I'm not super happy with this interface, but I think it's a lot better
> for a
> > few reasons:
> > 1) It's not all or nothing.  Even if a result can't be 100% loaded into
> > memory, it doesn't mean that we'll have to deal with the overhead of
> every
> > fetch causing an asynchronous firing of an event.
>
> I'm concerned that this will be a race to the bottom where all cursors
> will have to be synchronous for website compatibility. If a browser
> happens to always load 50 records synchronously and a website always
> happens to get results with less than 50 records, then it is likely
> that it will come to depend on all results being available
> synchronously.
>

*sigh* you're probably right...


> The implementation could still transfer results to the thread the page
> runs in in batches.
>

Sure, but if the background thread can't keep up with the page's thread,
you'll start blocking.


> An alternative would be to have the ability to specify a batch-size
> parameter. Results would be made available synchronously in batches of
> this size.
>

Maybe this is what we want to do.  I wanted to avoid letting the web
developer decide batch sizes (to avoid them just picking huge batch sizes),
but I guess there's only so much we can do....?


> > 2) There's an estimated count even if it's not pre-loaded (which has been
> > requested in other threads).
>
> It was very intentional to remove this. My concern is again that this
> will be a race to the bottom where we're forced to provide an exact
> count. It seems likely that some implementations will provide a exact
> count as long as the count is less than X, and that websites will come
> to depend on this. This leads to that everyone will have to provide an
> estimate of equal quality.
>
> It will likely also lead to sites having bugs if they only on occasion
> get results with more than X hits. For example a webmail
> implementation which lets the user search for emails. This could very
> easily end up working well as long as the estimated count is accurate,
> but fail once enough results are returned that the estimate is no
> longer correct.
>

How about we spec it as being int(log10(count))?  :-)


 >> We have a few open issues:
> >> 1. What should happen when IDBRequest.abort() is called on a write
> >> request, such as modify()? The data might have already been written to
> >> the database. And additional data might have been written on top of it
> >> using a different request. A simple solution is to make abort() on
> >> write requests throw.
> >
> > This seems reasonable.  The other alternative is to make each write call
> a
> > closed nested transaction that isn't committed until the onsuccess
> callback
> > returns and simply roll it back if abort is called before then, but we
> > already decided against adding closed nested transaction support for now.
>
> And it would mean that calling abort() on one request also aborts a
> bunch of other requests (i.e. any requests started after the aborted
> request). I think this would be surprising.
>

Well, any other request that it might could have been aborted anyway, so
hopefully developers will handle this.  And if they don't...well, there's
only so much we can do.

That said, I think the simple answer is good enough here.

>> 2. Do we need to add support for temporary objectStores. I.e. stores
> >> with a lifetime as long as a transaction that are only used to
> >> implement a complex query. If so, we can add a createObjectStore
> >> function on IDBTransactionRequest which synchronously returns a
> >> nameless newly created objectStore.
> >
> > The only use case I can think of for this is if you might want to spill
> > things to disk.  Otherwise you might as well just keep them in memory.
> > In future versions, I expect there'll be reasons to add this (for
> example,
> > if we add joins), but until then I don't think it's worth the extra API
> > surface area.
>
> In my past life I wrote a lot of SQL queries and stared at their
> execution strategy (SQLServer had awesome visualizations for this). I
> know I saw a lot of temporary tables being created, and I remember
> thinking that this made sense. But I can no longer remember under
> which conditions this happened.


Ugh, now that I think about it, my entire argument was based on the
assumption that a purely JS tree is faster than storing something in an
objectStore.  But now that I think about it, I'm not sure there are any
terribly fast JS tree implementations.  A lot of joins could be done with
hash tables, but I suspect there would still be plenty of use cases where a
tree based "temporary table" is required.

Before this realization, I was not trying to argue that temporary tables are
in general useless, but rather hat there'd be no advantage to storing your
"temporary table" in an IndexedDB over storing it purely in JS.  This is
because IndexedDB doesn't offer any features (yet) beyond simply storing
data in a tree (and transactions, but we're assuming temporary tables are
only part of one transaction).  That's why I suggested that when we add
joins it might make sense because then there would probably be advantages to
keeping data in the engine rather than just JS.  But, of course, my entire
argument was based on there being fast JS tree implementations, and now I'm
not sure whether or not that's true.


> Would love to get input from more database experienced people than me.
>
> In any event, I think the important part is that the new proposal
> clearly supports this functionality if we decide to add it down the
> line. We don't really need to decide one way or another in this
> thread.
>

Agreed that we need to at very least keep this door open to us.


>  >> 3. Should an error in a read or write always result in the full
> >> transaction getting rolled back? Or should we simply fire an error
> >> event on the failed request? Or something inbetwee, such as firing an
> >> error event and make the default action to roll back the transaction
> >> (i.e. if the page doesn't want rollback to happen it has to call
> >> event.preventDefault).
> >
> > Good question.  The default action of rollback seems like the most sane,
> but
> > I'm far from sure that's the right behavior.
>
> I agree that I like that as default behavior. And DOM Events has a
> fairly nice system for this already.
>
> Another question is what should happen if a success event handler
> throws an exception. Should this result in a rollback? Would that be
> compatible with the DOM Events spec?
>

Good question.  If it is compatible, we probably should.  Otherwise
everything will need to be wrapped in a try/catch block to avoid committing
garbage data.



On Tue, May 18, 2010 at 8:34 PM, Jonas Sicking <jonas@sicking.cc> wrote:

> On Tue, May 18, 2010 at 12:10 PM, Jeremy Orlow <jorlow@chromium.org>
> wrote:
>
> Like I said, I'm not super happy with what I proposed, but I think some
> > hybrid async/sync interface is really what we need.  Have you guys spent
> any
> > time thinking about something like this?  How dead-set are you on
> > synchronous cursors?
>
> The idea is that synchronous cursors load all the required data into
> memory, yes. I think it would help authors a lot to be able to load
> small chunks of data into memory and read and write to it
> synchronously. Dealing with asynchronous operations constantly is
> certainly possible, but a bit of a pain for authors.
>

Any ideas on how best to add this to your spec proposal?


> I don't think we should obsess too much about not keeping things in
> memory, we already have things like canvas and the DOM which adds up
> to non-trivial amounts of memory.
>
> Just because data is loaded from a database doesn't mean it's huge.
>
> I do note that you're not as concerned about getAll()


I was trying to pick my battles.  :-)


> which actually
> have worse memory characteristics than synchronous cursors since you
> need to create the full JS object graph in memory.
>

Good point.

OK, I guess I'm fine with synchronous cursors.

J
Received on Tuesday, 18 May 2010 20:25:09 UTC