Re: [IndexedDB] Current editor's draft from Jonas Sicking on 2010-07-14 (public-webapps@w3.org from July to September 2010)

From: Jonas Sicking <jonas@sicking.cc>
Date: Wed, 14 Jul 2010 00:06:55 -0700
To: Pablo Castro <Pablo.Castro@microsoft.com>
Cc: Andrei Popescu <andreip@google.com>, Nikunj Mehta <nikunj@o-micron.com>, public-webapps <public-webapps@w3.org>
Message-ID: <AANLkTilX9SdNTaTdKpu2j-O0EDdIacqD53I09KCT4_mS@mail.gmail.com>
Hi Pablo,

First off, thanks for your comments! (Probably too much) details below.

On Tue, Jul 13, 2010 at 7:52 PM, Pablo Castro
<Pablo.Castro@microsoft.com> wrote:
>
> From: public-webapps-request@w3.org [mailto:public-webapps-request@w3.org] On Behalf Of Andrei Popescu
> Sent: Monday, July 12, 2010 5:23 AM
>
> Sorry I disappeared for a while. Catching up with this discussion was an interesting exercise...there is no particular message in this thread I can respond to, so I thought I'd just reply to the last one. Overall I think the new proposal is shaping up well and is being effective in simplifying scenarios. I do have a few suggestions and questions for things I'm not sure I see all the way.
>
> READ_ONLY vs READ_WRITE as defaults for transactions:
> To be perfectly honest, I think this discussion went really deep over an issue that won't be a huge deal for most people. My perspective, trying to avoid performance or usage frequency speculation, is around what's easier to detect. Concurrency issues are hard to see. On the other hand, whenever we can throw an exception and give explicit guidance that unblocks people right away. For this case I suspect it's best to default to READ_ONLY, because if someone doesn't read or think about it and just uses the stuff and tries to change something they'll get a clear error message saying "if you want to change stuff, use READ_WRITE please". The error is not data- or context-dependent, so it'll fail on first try at most once per developer and once they fix it they'll know for all future cases.

Yup, this was exactly my thinking.

> Dynamic transactions:
> I see that most folks would like to see these going away. While I like the predictability and simplifications that we're able to make by using static scopes for transactions, I worry that we'll close the door for two scenarios: background tasks and query processors. Background tasks such as synchronization and post-processing of content would seem to be almost impossible with the static scope approach, mostly due to the granularity of the scope specification (whole stores). Are we okay with saying that you can't for example sync something in the background (e.g. in a worker) while your app is still working? Am I missing something that would enable this class of scenarios? Query processors are also tricky because you usually take the query specification in some form after the transaction started (especially if you want to execute multiple queries with later queries depending on the outcome of the previous ones). The background tasks issue in particular looks pretty painful to me if we don't have a way to achieve it without freezing the application while it happens.

I don't understand enough of the details here to be able to make a
decision. The use cases you are bringing up I definitely agree are
important, but I would love to look at even a rough draft of what code
you are expecting people will need to write.

What I suggest is that we keep dynamic transactions in the spec for
now, but separate the API from static transactions, start a separate
thread and try to hammer out the details and see what we arrive at. I
do want to clarify that I don't think dynamic transactions are
particularly hard to implement, I just suspect they are hard to use
correctly.

> Implicit commit:
> Does this really work? I need to play with sample app code more, it may just be that I'm old-fashioned. For example, if I'm downloading a bunch of data form somewhere and pushing rows into the store within a transaction, wouldn't it be reasonable to do the whole thing in a transaction? In that case I'm likely to have to unwind while I wait for the next callback from XmlHttpRequest with the next chunk of data.

You definitely want to do it in a transaction. In our proposal there
is no way to even call .get or .put if you aren't inside a
transaction. For the case you are describing, you'd download the data
using XMLHttpRequest first. Once the data has been downloaded you
start a transaction, parse the data, and make the desired
modifications. Once that is done the transaction is automatically
committed.

The idea here is to avoid keeping transactions open for long periods
of time, while at the same time making the API easier to work with.
I'm very concerned that any API that requires people to do:

startOperation();
... do lots of stuff here ...
endOperation();

people will forget to do the endOperation call. This is especially
true if the startOperation/endOperation calls are spread out over
multiple different asynchronously called functions, which seems to be
the use case you're concerned about above. One very easy way to
"forget" to call endOperation is if something inbetween the two
function calls throw an exception.

This will likely be extra bad for transactions where no write
operations are done. In this case failure to call a 'commit()'
function won't result in any broken behavior. The transaction will
just sit open for a long time and eventually "rolled back", though
since no changes were done, the rollback is transparent, and the only
noticeable effect is that the application halts for a while while the
transaction is waiting to time out.

I should add that the WebSQLDatabase uses automatically committing
transactions very similar to what we're proposing, and it seems to
have worked fine there.

> I understand that avoiding it results in nicer patterns (e.g. db.objectStores("foo").get(123).onsuccess = ...), but in practice I'm not sure if that will hold given that you still need error callbacks and such.

For what it's worth, this wasn't at all part of our consideration. In
fact, I suspect that this part would look exactly the same even if we
didn't have implicitly committing transactions.

And as you say, you still usually need error callbacks. In fact, we
have found while writing examples using our implementation, that you
almost always want to add a generic error handler. It's very easy to
make a mistake, and if you don't add error handlers then these just go
by silently, offering no help as to why your program isn't working.
Though possibly a better implementation could put information in the
developer console if it detected that an error event was fired but
no-one was listening.

> Nested transactions:
> Not sure why we're considering this an advanced scenario. To be clear about what the feature means to me: make it legal to start a transaction when one is already in progress, and the nested one is effectively a no-op, just refcounts the transaction, so you need equal amounts of commit()'s, implicit or explicit, and an abort() cancels all nested transactions. The purpose of this is to allow composition, where a piece of code that needs a transaction can start one locally, independently of whether the caller had already one going.

Ah. I generally though of nested transactions as the ability to roll
back just an "inner" transaction, while keeping the changes made by an
outer one. Your version of nested transactions would be a lot easier
to implement I suspect.

I take it the reason one would want to start a nested transaction,
rather than simply create a new one, is to ensure that all locks are
being kept held, and that no other changes from another transaction
can slip in between?

If so, wouldn't the API require that the outer transaction is somehow
referenced (through an function argument or otherwise)? And if so,
couldn't the inner could simply use the outer transaction? This seems
to be the case in the current drafts.

> Schema versioning:
> It's unfortunate that we need to have explicit elements in the page for the versioning protocol to work, but the fact that we can have a reliable mechanism for pages to coordinate a version bump is really nice. For folks that don't know about this the first time they build it, an explicit error message on the schema change timeout can explain where to start. I do think that there may be a need for non-breaking changes to the schema to happen without a "version dance". For example, query processors regularly create temporary tables during sorts and such. Those shouldn't require any coordination (maybe we allow non-versioned additions, or we just introduce temporary, unnamed tables that evaporate on commit() or database close()...).

If we do need support for temporary objectStores, I think we should
simply add an API like:

interface IDBTransaction {
  ...
  IDBObjectStore createTemporaryObjectStore();
  ...
};

I.e. do the unnamed evaporating objectStores that go away on commit.

That way the application doesn't have to worry about name collisions
if two transactions run at the same time, or worry about forgetting to
call removeObjectStore at the end of the transaction. We could also
allow temporary objectStores be written to, even if the transaction
was opened in READ_ONLY mode.

In fact, since a temporary objectStore generally should be removed at
the end of a transaction, if we used the normal createObjectStore,
wouldn't we have to require a "breaking" version-change transaction
since removeObjectStore is called at the end of the transaction? I.e.
temporary object stores can't be implemented using add-only
non-versioned transactions.

However I'd love to hear about when temporary objectStores are used. I
know I've seen them in evaluation strategies created by SQL databases,
but I wasn't able to come up with an example off the top of my head
for a use case that would require them.

/ Jonas
Received on Wednesday, 14 July 2010 07:07:51 UTC