Re: [IndexDB] Proposal for async API changes

On Tue, May 18, 2010 at 7:20 AM, Jeremy Orlow <jorlow@chromium.org> wrote:
> Overall, I'm pretty happy with these changes.  I support making these
> changes to the spec.  Additional comments inline...
> On Tue, May 18, 2010 at 2:15 AM, Jonas Sicking <jonas@sicking.cc> wrote:
>>
>> Hi All,
>>
>> I, together with Ben Turner and Shawn Wilsher have been looking at the
>> asynchronous API defined in the IndexDB specification and have a set
>> of changes to propose. The main goal of these changes is to simplify
>> the API that we expose to authors, making it easier for them to work
>> with. Another goal has been to reduce the risk that authors misuse the
>> API and use long running transactions. Finally, it has been a goal to
>> reduce the risk of situations that can race.
>>
>> It has explicitly not been a goal to simplify the implementation. In
>> some cases it is definitely harder to implement the proposed API.
>> However, we believe that the extra complexity in implementation is
>> outweighed by simplicity for users of the API.
>>
>> The main changes are:
>>
>> 1. Once a database has been opened (a database connection has been
>> established) read access to meta-data, such as objectStore and index
>> names, is synchronous. Changes to such meta data, such as creating
>> objectStores and indexes, is still asynchronous.
>
> I believe this is already how it's specced.  The IDBDatabase interface
> already gives you synchronous access to all of this.

The big difference is that the current spec makes openObjectStore()
and openIndex() asynchronous. Our proposal makes openObjectStore() and
openIndex() (renamed objectStore() and index()) synchronous. So
opening an objectstore, or even starting a transaction, only
synchronously accesses metadata. But any requests you make on the
transaction will be held until the transaction has managed to grab the
requested tables.

So when you, in our proposal, call:

db.objectStore("foo", READ_WRITE).put(...);

the objectStore function synchronously creates a transaction object
representing a transaction which only contains the "foo" objectStore.
The implementation then fires off a asynchronous request to lock the
"foo" objectStore with a write lock. It then returns the synchronously
created transaction object.

When the put() function is called, the implementation notices that the
lock is not yet acquired. So it simply records what information should
be written to the object store.

Later, when the write lock is successfully acquired, the
implementation executes the recorded operations and once they finish
we call their callbacks.

>> 9. IDBKeyRanges are created using functions on IndexedDatabaseRequest.
>> We couldn't figure out how the old API allowed you to create a range
>> object without first having a range object.
>
> In the spec, I see the following in examples:
> var range = new IDBKeyRange.bound(2, 4);
> and
> var range = IDBKeyRange.leftBound(key);
> I'm not particularly happy with hanging functions off of
> IndexedDatabaseRequest for this.  Can it work something like what I listed
> above?  If not, maybe we can find a better place to put them?  Or just
> create multiple openCursor functions for each case?

Mostly we were just confused as to what syntax was actually proposed.
You are listing two syntaxes (with and without 'new'), neither of
which match the WebIDL in the spec. I personally think that most
proposed syntaxes are ok and don't care much which one we choose, as
long as it's clearly defined.

>> 10. You are allowed to have multiple transactions per database
>> connection. However if they use overlapping tables, only the first one
>> will receive events until it is finished (with the usual exceptions of
>> allowing multiple readers of the same table).
>
> Can you please clarify what you mean here?  This seems like simply an
> implementation detail to me, so maybe I'm missing something?

The spec currently does explicitly forbid having multiple transactions
per database connection. The syntax doesn't even support it since
there is a .currentTransaction property on IDBDatabase. I.e. the
following code seems forbidden (though I'm not sure if forbidden means
that an exception will be thrown somewhere, or if it means that the
code will just silently fail to work).

request = indexedDB.openDatabase("myDB", "...");
request.onsuccess = function() {
  db = request.result;
  r1 = db.openTransaction(["foo"]);
  r1.onsuccess = function() { ... };
  r2 = db.openTransaction(["bar"]);
  r2.onsuccess = function() { ... };
};

The spec says that the above is forbidden. In our new proposal the
following would be allowed:

request = indexedDB.openDatabase("myDB", "...");
request.onsuccess = function() {
  db = request.result;
  t1 = db.transaction(["foo"], READ_WRITE);
  t2 = db.transaction(["bar"], READ_WRITE);
};

And would allow the two transactions to run concurrently.

>> A draft of the proposed API is here:
>>
>> http://docs.google.com/View?id=dfs2skx2_4g3s5f857
>
> Comments:
> 1)  IDBRequest.abort() in IDBRequest needs to be able to raise.

I think in general we haven't indicated which functions can raise
exceptions. But yes, I agree abort() should be able to raise.

> 2)  I see that IndexedDatabaseRequest.open() can no longer raise, but that's
> probably fine.  (Having some errors raise and some call onerror seems
> confusing).

This wasn't an intentional omission (see above re not having marked
what can raise), but I agree that IndexedDatabaseRequest.open should
never raise.

> 3)  If we go with your recommendation for setVersion in your other email, it
> should be firing a IDBTransactionEvent.

Fixed. I also made the result of the request be the database connection.

> 4)  IDBTransactionEvent includes a db parameter that would probably be
> useful for all the events.  I'd suggest moving it to IDBEvent and simply
> making it be null for the case of IndexedDatabaseRequest.open() calling the
> onerror handler.

I don't see such a parameter in IDBTransactionEvent? I'm not sure what
it would be useful for since likely everything you want to do is
available on the transaction object.

> 5)  You have two IDBTransactionRequest.onaborts.  I think one is supposed to
> be an ontimeout.

Fixed.

> 6)  What is the default limit for the getAll functions?  Should we make it 0
> (and define any <=0 amount to mean infinity)?

The idea was that infinity would be the default. Weather we do that by
making the default be 0 and have 0 mean infinity, or if we do that by
saying that if the parameter isn't specified that there is no limit, I
don't really care.

> 7)  I expect "add or modify" to be more used than the add or the modify
> methods.  As such, I wonder if it makes sense to optimize the naming for
> that.  For example, addOrModify=>set, add=>add/insert,
> modify=>modify/update/replace maybe?

Yeah, the name addOrModify sort of sucks. We pondered a bunch of
alternatives, such as set/put/write/update. I'd like to avoid
bikeshedding on the issue here, but would be happy to do so in a
separate thread.

> 8)   We can't leave deciding whether a cursor is pre-loaded up to UAs
> since people will code for their favorite UA and then access
> IDBCursorPreloadedRequest.count when some other UA does it as a
> non-preloaded request.  Even in the same UA this will cause problems when
> users have different datasets than developers are testing with.

As Shawn said, the type returned is decided based on the 'sync'
argument to openCursor. So no UA dependent behavior here.

> Overall, I like the direction you're taking IDBCursor, but I'd like to offer
> a counter-proposal:
> interface IBDCursorRequest : IDBCursor {
>   readonly attribute any key;
>
>   readonly attribute any value;
>   readonly attribute unsigned long long estimatedCount;
>
>   // Returns null if the cursor was updated synchronously.  Otherwise
>   // will return an IDBRequest object whose result will be set to this
>   // cursor on success.  Until onsuccess is called, any access of key
>   // or value will raise an exception.  The first request MUST cause
>   // an asynchronous request but the behavior of subsequent calls is
>   // up to the UA.
>   IDBRequest continue(in optional any key /* null */);
>   // Success fires IDBTransactionEvent, result == key
>   IDBRequest update(in any value);
>
>   // Success fires IDBTransactionEvent, result == null
>   IDBRequest remove();
> };
> I'm not super happy with this interface, but I think it's a lot better for a
> few reasons:
> 1) It's not all or nothing.  Even if a result can't be 100% loaded into
> memory, it doesn't mean that we'll have to deal with the overhead of every
> fetch causing an asynchronous firing of an event.

I'm concerned that this will be a race to the bottom where all cursors
will have to be synchronous for website compatibility. If a browser
happens to always load 50 records synchronously and a website always
happens to get results with less than 50 records, then it is likely
that it will come to depend on all results being available
synchronously.

The implementation could still transfer results to the thread the page
runs in in batches.

An alternative would be to have the ability to specify a batch-size
parameter. Results would be made available synchronously in batches of
this size.

> 2) There's an estimated count even if it's not pre-loaded (which has been
> requested in other threads).

It was very intentional to remove this. My concern is again that this
will be a race to the bottom where we're forced to provide an exact
count. It seems likely that some implementations will provide a exact
count as long as the count is less than X, and that websites will come
to depend on this. This leads to that everyone will have to provide an
estimate of equal quality.

It will likely also lead to sites having bugs if they only on occasion
get results with more than X hits. For example a webmail
implementation which lets the user search for emails. This could very
easily end up working well as long as the estimated count is accurate,
but fail once enough results are returned that the estimate is no
longer correct.

> 3) Because the first continue call will always be asynchronous, people using
> this interface cannot get away with assuming results are always returned
> synchronously, so the problems I mentioned in (8) are pretty much gone.
> I'm sure the design can be improved on or that there might be some other
> ways to solve the same problems.  I'm open to suggestions.

I'm not sure why it would be less likely that people would write code like:

db.objectStore("foo").openCursor(range).onsuccess = function(e) {
  cursor = e.target.result;
  setTableSize(cursor.estimatedCount);
  var i = 0;
  do {
    setTableDataAt(i, cursor.value.x, cursor.value.y);
  } while (!cursor.continue());
}

Which would work fine until the estimate fails and data starts being
returned in multiple batches.

>> We have a few open issues:
>> 1. What should happen when IDBRequest.abort() is called on a write
>> request, such as modify()? The data might have already been written to
>> the database. And additional data might have been written on top of it
>> using a different request. A simple solution is to make abort() on
>> write requests throw.
>
> This seems reasonable.  The other alternative is to make each write call a
> closed nested transaction that isn't committed until the onsuccess callback
> returns and simply roll it back if abort is called before then, but we
> already decided against adding closed nested transaction support for now.

And it would mean that calling abort() on one request also aborts a
bunch of other requests (i.e. any requests started after the aborted
request). I think this would be surprising.

>> 2. Do we need to add support for temporary objectStores. I.e. stores
>> with a lifetime as long as a transaction that are only used to
>> implement a complex query. If so, we can add a createObjectStore
>> function on IDBTransactionRequest which synchronously returns a
>> nameless newly created objectStore.
>
> The only use case I can think of for this is if you might want to spill
> things to disk.  Otherwise you might as well just keep them in memory.
> In future versions, I expect there'll be reasons to add this (for example,
> if we add joins), but until then I don't think it's worth the extra API
> surface area.

In my past life I wrote a lot of SQL queries and stared at their
execution strategy (SQLServer had awesome visualizations for this). I
know I saw a lot of temporary tables being created, and I remember
thinking that this made sense. But I can no longer remember under
which conditions this happened.

Would love to get input from more database experienced people than me.

In any event, I think the important part is that the new proposal
clearly supports this functionality if we decide to add it down the
line. We don't really need to decide one way or another in this
thread.

>> 3. Should an error in a read or write always result in the full
>> transaction getting rolled back? Or should we simply fire an error
>> event on the failed request? Or something inbetwee, such as firing an
>> error event and make the default action to roll back the transaction
>> (i.e. if the page doesn't want rollback to happen it has to call
>> event.preventDefault).
>
> Good question.  The default action of rollback seems like the most sane, but
> I'm far from sure that's the right behavior.

I agree that I like that as default behavior. And DOM Events has a
fairly nice system for this already.

Another question is what should happen if a success event handler
throws an exception. Should this result in a rollback? Would that be
compatible with the DOM Events spec?

/ Jonas

Received on Tuesday, 18 May 2010 19:19:27 UTC