Re: [IndexedDB] Current editor's draft from Jeremy Orlow on 2010-07-15 (public-webapps@w3.org from July to September 2010)

From: Jeremy Orlow <jorlow@chromium.org>
Date: Thu, 15 Jul 2010 09:50:59 +0100
To: Jonas Sicking <jonas@sicking.cc>
Cc: Pablo Castro <Pablo.Castro@microsoft.com>, Andrei Popescu <andreip@google.com>, Nikunj Mehta <nikunj@o-micron.com>, public-webapps <public-webapps@w3.org>
Message-ID: <AANLkTileHB2tOokm0JwgjnigeXsQ2LYOG6ncOhXOt_Tz@mail.gmail.com>
On Thu, Jul 15, 2010 at 2:37 AM, Jonas Sicking <jonas@sicking.cc> wrote:

> On Wed, Jul 14, 2010 at 6:05 PM, Pablo Castro
> <Pablo.Castro@microsoft.com> wrote:
> >
> > From: Jonas Sicking [mailto:jonas@sicking.cc]
> > Sent: Wednesday, July 14, 2010 5:43 PM
> >
> > On Wed, Jul 14, 2010 at 5:03 PM, Pablo Castro
> > <Pablo.Castro@microsoft.com> wrote:
> >>
> >> From: Jonas Sicking [mailto:jonas@sicking.cc]
> >> Sent: Wednesday, July 14, 2010 12:07 AM
> >>
> >
> >>> I think what I'm struggling with is how dynamic transactions will help
> >>> since they are still doing whole-objectStore locking. I'm also curious
> >>> how you envision people dealing with deadlock hazards. Nikunjs
> >>> examples in the beginning of this thread simply throw up their hands
> >>> and report an error if there was a deadlock. That is obviously not
> >>> good enough for an actual application.
> >>>
> >>> So in short, looking forward to an example :)
> >
> > I'll try to come up with one, although I doubt the code itself will be
> very interesting in this particular case. Not sure what you mean by "they
> are still doing whole-objectStore locking". The point of dynamic
> transactions is that they *don't* lock the whole store, but instead have the
> freedom to choose the granularity (e.g. you could do row-level locking).
>
> My understanding is that the currently specced dynamic transactions
> are still whole-objectStore.


My understanding is that of Pablo's.  I'm not aware of anything in the spec
that'd limit you to object-store wide locks.  Whats more, if this were true
then I'd be _very_ against adding dynamic transactions in v1 since they'd
offer us very little in turn for a lot of complexity.

This misunderstanding would definitely explain a lot of confusion within our
discussions though.  :-)


> Once you call openObjectStore and
> successfully receive the objectStore through the 'success' event, a
> lock is held on the whole objectStore until the transaction is
> committed. No other transaction, dynamic or static, can open the
> objectStore in the meantime.
>
> I base this on the sentence: "There MAY not be any overlap among the
> scopes of all open connections to a given database" from the spec.
>
> But I might be misunderstanding things entirely.
>
> Nikunj, could you clarify how locking works for the dynamic
> transactions proposal that is in the spec draft right now?
>

I'd definitely like to hear what Nikunj originally intended here.

 > As for deadlocks, whenever you're doing an operation you need to be ready
> to handle errors (out of disk, timeout, etc.). I'm not sure why deadlocks
> are different. If the underlying implementation has deadlock detection then
> you may get a specific error, otherwise you'll just get a timeout.
>
> Well, I agree that while you have to handle errors to prevent
> dataloss, I suspect that most authors won't. Thus the more error
> conditions that we introduce, the more
>
> I think the difference is that deadlocks will happen often enough that
> they are a real concern. Out of disk space makes most desktop
> applications freak out enough that they generally cause dataloss, thus
> OSs tend to warn when you're running low on disk space.
>
> As for timeouts, I think we should make the defaults not be to have a
> timeout. Only if authors specifically specify a timeout parameter
> should we use one.
>
> My main line of thinking is that authors are going to generally be
> very bad at even looking for errors. Even less so at successfully
> handling those errors in a way that is satisfactory for the user. So I
> think the default behavior is that any time an error occurs, we'll end
> up rolling back the transaction and there will be dataloss.
>
> We should absolutely still provide good error handling opportunities
> so that authors can at least try to deal with it. However I'm not too
> optimistic that people will actually use them correctly.
>

I agree with all of this.

I'd also like to note that, as far as I can tell, without dynamic
transactions there's no possible way to deadlock.  And even with dynamic
transactions, it should be possible to implement things in a way that static
transactions always "win".  Which I think would be a good idea to ensure
that anyone using them gets the simplest possible API.


> >>> >>> This will likely be extra bad for transactions where no write
> >>> >>> operations are done. In this case failure to call a 'commit()'
> >>> >>> function won't result in any broken behavior. The transaction will
> >>> >>> just sit open for a long time and eventually "rolled back", though
> >>> >>> since no changes were done, the rollback is transparent, and the
> only
> >>> >>> noticeable effect is that the application halts for a while while
> the
> >>> >>> transaction is waiting to time out.
> >>> >>>
> >>> >>> I should add that the WebSQLDatabase uses automatically committing
> >>> >>> transactions very similar to what we're proposing, and it seems to
> >>> >>> have worked fine there.
> >>> >
> >>> > I find this a bit scary, although it could be that I'm permanently
> tainted with traditional database stuff. Typical databases follow a presumed
> abort protocol, where if your code is interrupted by an exception, a process
> crash or whatever, you can always assume transactions will be rolled back if
> you didn't reach an explicit call to commit. The implicit commit here takes
> that away, and I'm not sure how safe that is.
> >>> >
> >>> > For example, if I don't have proper exception handling in place, an
> illegal call to some other non-indexeddb related API may throw an exception
> causing the whole thing to unwind, at which point nothing will be pending to
> do in the database and thus the currently active transaction will be
> committed.
> >>> >
> >>> > Using the same line of thought we used for READ_ONLY, forgetting to
> call commit() is easy to detect the first time you try out your code. Your
> changes will simply not stick. It's not as clear as the READ_ONLY example
> because there is no opportunity to throw an explicit exception with an
> explanation, but the data not being around will certainly prompt developers
> to look for the issue :)
> >
> >>> Ah, I see where we are differing in thinking. My main concern has been
> >>> that of rollbacks, and associated dataloss, in the non-error case. For
> >>> example people forget to call commit() in some branch of their code,
> >>> thus causing dataloss when the transaction is rolled back.
> >>>
> >>> Your concern seems to be that of lack of rollback in the error case,
> >>> for example when an exception is thrown and not caught somewhere in
> >>> the code. In this case you'd want to have the transaction rolled back.
> >>>
> >>> One way to handle this is to try to detect unhandled errors and
> >>> implicitly roll back the transaction. Two situations where we could do
> >>> this is:
> >>> 1. When an 'error' event is fired, but where .preventDefault() has is
> >>> not called by any handler. The result is that if an error is ever
> >>> fired, but no one explicitly handles it, we roll back the transaction.
> >>> See also below.
> >>> 2. When a success handler is called, but the handler throws an
> exception.
> >>>
> >>> The second is a bit of a problem from a spec point of view. I'm not
> >>> sure it is allowed by the DOM Events spec, or by all existing DOM
> >>> Events implementations. I do still think we can pull it off though.
> >>> This is something I've been thinking about raising for a while, but I
> >>> wanted to nail down the raised issues first.
> >>>
> >>> Would you feel more comfortable with implicit commits if we did the
> above?
> >
> > It does make it better, although this seems to introduce quite moving
> parts to the process. I still think an explicit commit() would be better,
> but I'm open to explore more options.
>

For the record, I'm still slightly in the implicit commit camp (assuming we
do add the protections Jonas brought up), but Pablo's points are pretty
compelling.


> >>> >>> And as you say, you still usually need error callbacks. In fact, we
> >>> >>> have found while writing examples using our implementation, that
> you
> >>> >>> almost always want to add a generic error handler. It's very easy
> to
> >>> >>> make a mistake, and if you don't add error handlers then these just
> go
> >>> >>> by silently, offering no help as to why your program isn't working.
> >>> >>> Though possibly a better implementation could put information in
> the
> >>> >>> developer console if it detected that an error event was fired but
> >>> >>> no-one was listening.
> >>> >
> >>> > Somewhat unrelated, but I wonder if we should consider a global (per
> database session) error handler or something like that. Database operations
> are just too granular, so maybe the usual deal where you setup an error
> handler per-operation is not the right thing to do.
> >>>
> >>> This is a great idea. What we could do is to first fire the 'error'
> >>> event on the IDBRequest. If .preventDefault() is not called, we'll
> >>> fire an 'error' event on the IDBTransaction. If .preventDefault()
> >>> still hasn't been called, we fire an 'error' event on the IDBDatabase.
> >>> If .preventDefault() *still* hasn't been called, we roll back the
> >>> transaction.
> >>>
> >>> This is very similar to error handling in Workers. It might be overly
> >>> complex implementation wise, but it does seem like a nice API for
> >>> authors.
> >
> > I haven't thought about the implementation implications of this, but I
> like it from the API perspective. I can see the 80% case being that you
> setup a page-wide handler to handle a bit of cleanup and don't need
> per-operation error handlers.
>
> Yup, that's exactly my thinking. With the proposal above authors are
> free to do error handling at any level they want.
>

Someone should file a bug on this.


>  >>> Is there anything you need that isn't on the IDBTransaction object?
> >>> Depending on what information it is that is needed, maybe we can
> >>> simply add more things to IDBTransaction.
> >>>
> >>> In any case, we already have the IDBTransaction.db property, which is
> >>> the pointer to the database from the transaction that you are
> >>> suggesting. Would that be enough?
> >
> > This might be just enough. Let's call it good for now, until people write
> some real code and see if we missed anything.
>
> Sounds good :)
>

Ditto.

>>> >>> However I'd love to hear about when temporary objectStores are used.
> I
> >>> >>> know I've seen them in evaluation strategies created by SQL
> databases,
> >>> >>> but I wasn't able to come up with an example off the top of my head
> >>> >>> for a use case that would require them.
> >>> >
> >>> > Yes, it tends to be SQL-ish scenarios, even if SQL is not involved.
> For example, if you need to sort a large set you may want to sort it in
> blocks and keep the blocks on this. Note that this is not just about running
> out of memory, but also about not getting even close to it and make the rest
> of the system slow down because of lack of resources.
> >>>
> >>> Can you clarify what you mean by "sort it in blocks and keep the
> >>> blocks on this". Not sure it that sentence got cut off?
> >
> > Sorry for the non-sense. Second try: "if you need to sort a large set you
> may want to sort it in blocks and keep the blocks in storage for a later
> merge" (merge sort spilling to disk).
>
> Ok. I guess you could even do the sorting by inserting all values into
> a temporary objectStore which is keyed on the value you want to sort
> on.
>
> I'm fine with adding temporary objectStores. Curious what others think.
>

I agree that the only usage for temporary tables at the moment is spilling
to disk.  The reason I'm weakly against them is that I'd be really surprised
if anyone is going to be dealing with datasets large enough for this to be a
real issue in the near term.  Especially since sticking data in memory is so
much easier.

That said, I'm sure we'll need such functionality at some point, and I don't
see it being much in the way of implementation burden, so if you guys feel
strongly, I'm OK with it.

J
Received on Thursday, 15 July 2010 08:51:49 UTC