Re: [IndexedDB] Current editor's draft from Jeremy Orlow on 2010-07-14 (public-webapps@w3.org from July to September 2010)

From: Jeremy Orlow <jorlow@chromium.org>
Date: Wed, 14 Jul 2010 08:10:11 +0100
To: Pablo Castro <Pablo.Castro@microsoft.com>
Cc: Andrei Popescu <andreip@google.com>, Nikunj Mehta <nikunj@o-micron.com>, Jonas Sicking <jonas@sicking.cc>, public-webapps <public-webapps@w3.org>
Message-ID: <AANLkTikkvhztRtWo5aNv2g-9nKyzebkN5SV3T4lqdcML@mail.gmail.com>
On Wed, Jul 14, 2010 at 3:52 AM, Pablo Castro <Pablo.Castro@microsoft.com>wrote:

>
> From: public-webapps-request@w3.org [mailto:public-webapps-request@w3.org]
> On Behalf Of Andrei Popescu
> Sent: Monday, July 12, 2010 5:23 AM
>
> Sorry I disappeared for a while. Catching up with this discussion was an
> interesting exercise...


Yes, Indeed.  :-)


> there is no particular message in this thread I can respond to, so I
> thought I'd just reply to the last one.


Probably a good idea.  I was trying to respond hixie style--which is harder
than it looks on stuff like this.


Overall I think the new proposal is shaping up well and is being effective
> in simplifying scenarios. I do have a few suggestions and questions for
> things I'm not sure I see all the way.
>
> READ_ONLY vs READ_WRITE as defaults for transactions:
> To be perfectly honest, I think this discussion went really deep over an
> issue that won't be a huge deal for most people. My perspective, trying to
> avoid performance or usage frequency speculation, is around what's easier to
> detect. Concurrency issues are hard to see. On the other hand, whenever we
> can throw an exception and give explicit guidance that unblocks people right
> away. For this case I suspect it's best to default to READ_ONLY, because if
> someone doesn't read or think about it and just uses the stuff and tries to
> change something they'll get a clear error message saying "if you want to
> change stuff, use READ_WRITE please". The error is not data- or
> context-dependent, so it'll fail on first try at most once per developer and
> once they fix it they'll know for all future cases.
>

Couldn't have said it better myself.


Dynamic transactions:
> I see that most folks would like to see these going away. While I like the
> predictability and simplifications that we're able to make by using static
> scopes for transactions, I worry that we'll close the door for two
> scenarios: background tasks and query processors. Background tasks such as
> synchronization and post-processing of content would seem to be almost
> impossible with the static scope approach, mostly due to the granularity of
> the scope specification (whole stores). Are we okay with saying that you
> can't for example sync something in the background (e.g. in a worker) while
> your app is still working? Am I missing something that would enable this
> class of scenarios? Query processors are also tricky because you usually
> take the query specification in some form after the transaction started
> (especially if you want to execute multiple queries with later queries
> depending on the outcome of the previous ones). The background tasks issue
> in particular looks pretty painful to me if we don't have a way to achieve
> it without freezing the application while it happens.
>

Well, the application should never freeze in terms of the UI locking up, but
in what you described I could see it taking a while for data to show up on
the screen.  This is something that can be fixed by doing smaller updates on
the background thread, sending a message to the background thread that it
should abort for now, doing all database access on the background thread,
etc.

One point that I never saw made in the thread that I think is really
important is that dynamic transactions can make concurrency worse in some
cases.  For example, with dynamic transactions you can get into live-lock
situations.  Also, using Pablo's example, you could easily get into a
situation where the long running transaction on the worker keeps hitting
serialization issues and thus it's never able to make progress.

I do see that there are use cases where having dynamic transactions would be
much nicer, but the amount of non-determinism they add (including to
performance) has me pretty worried.  I pretty firmly believe we should look
into adding them in v2 and remove them for now.  If we do leave them in, it
should definitely be in its own method to make it quite clear that the
semantics are more complex.


Implicit commit:
> Does this really work? I need to play with sample app code more, it may
> just be that I'm old-fashioned. For example, if I'm downloading a bunch of
> data form somewhere and pushing rows into the store within a transaction,
> wouldn't it be reasonable to do the whole thing in a transaction? In that
> case I'm likely to have to unwind while I wait for the next callback from
> XmlHttpRequest with the next chunk of data. I understand that avoiding it
> results in nicer patterns (e.g. db.objectStores("foo").get(123).onsuccess =
> ...), but in practice I'm not sure if that will hold given that you still
> need error callbacks and such.
>

I believe your example of doing XHRs in the middle of a transaction is
something we were explicitly trying to avoid making possible.  In this case,
you should do all of your XHRs first and then do your transaction.  If you
need to read form the ObjectStore, do a XHR, and then write to the
ObjectStore, you can implement it with 2 transactions and have the second
one verify the data has not changed before doing the actual work.

Allowing things like XHRs in the middle of an operation will encourage
really long running transactions that will be really bad for concurrency and
make the transaction system much less elegant than it currently is.



> Nested transactions:
> Not sure why we're considering this an advanced scenario. To be clear about
> what the feature means to me: make it legal to start a transaction when one
> is already in progress, and the nested one is effectively a no-op, just
> refcounts the transaction, so you need equal amounts of commit()'s, implicit
> or explicit, and an abort() cancels all nested transactions. The purpose of
> this is to allow composition, where a piece of code that needs a transaction
> can start one locally, independently of whether the caller had already one
> going.
>

I believe it's actually a bit more tricky than what you said.  For example,
if we only support static transactions, will we require that any nested
transaction only request a subset of the locks the outer one took?  What if
we try to start a dynamic transaction inside of a static one?  Etc.  But I
agree it's not _that_ tricky and I'm also not convinced it's an "advanced"
feature.

I'd suggest we take it out for now and look at re-adding it when the basics
of the async API are more solidified.  I hope we can get it into v1, but we
have too much in the air right now as is.


Schema versioning:
> It's unfortunate that we need to have explicit elements in the page for the
> versioning protocol to work, but the fact that we can have a reliable
> mechanism for pages to coordinate a version bump is really nice. For folks
> that don't know about this the first time they build it, an explicit error
> message on the schema change timeout can explain where to start. I do think
> that there may be a need for non-breaking changes to the schema to happen
> without a "version dance". For example, query processors regularly create
> temporary tables during sorts and such. Those shouldn't require any
> coordination (maybe we allow non-versioned additions, or we just introduce
> temporary, unnamed tables that evaporate on commit() or database
> close()...).
>

I agree we should have a way to do non-beaking changes to the schema at some
point, but I believe it can wait till v2 at this point.  Temporary
objectStores seems to be the leading reason why people want this now, so
maybe we should consider adding them to the spec now.  That said, I'm still
not convinced that there are many use cases where one needs them.
 Everything you can do with a temporary objectStore you should be able to do
in memory as well.  And thus the only reason to add them is if we're handing
enough data that some will spill to disk.  And I'm not convinced this will
be a very mainstream scenario.  Especially since one should be able to do
merge joins in many cases.

I feel strongly that what Jonas has proposed is what we should do for v1.  I
think he's explained the reasoning behind the API pretty well in the thread.


Other points:

*_NO_DUPLICATES:
I'm still not convinced we need this in v1.  It will help performance in
some cases, but it adds more API surface area than immediately meets the
eye.  If we do decide to have it in v1, we need to resolve the issues Jonas
brought up.  Ideally we would do this on the thread Jonas started
("[IndexedDB] .value of no-duplicate cursors").

Pre-loaded cursors + getAll:
I'm glad we've decided to take these out for the time being.

J
Received on Wednesday, 14 July 2010 07:11:02 UTC