Re: [IndexedDB] Current editor's draft

On Wed, Jul 14, 2010 at 6:05 PM, Pablo Castro
<Pablo.Castro@microsoft.com> wrote:
>
> From: Jonas Sicking [mailto:jonas@sicking.cc]
> Sent: Wednesday, July 14, 2010 5:43 PM
>
> On Wed, Jul 14, 2010 at 5:03 PM, Pablo Castro
> <Pablo.Castro@microsoft.com> wrote:
>>
>> From: Jonas Sicking [mailto:jonas@sicking.cc]
>> Sent: Wednesday, July 14, 2010 12:07 AM
>>
>
>>> I think what I'm struggling with is how dynamic transactions will help
>>> since they are still doing whole-objectStore locking. I'm also curious
>>> how you envision people dealing with deadlock hazards. Nikunjs
>>> examples in the beginning of this thread simply throw up their hands
>>> and report an error if there was a deadlock. That is obviously not
>>> good enough for an actual application.
>>>
>>> So in short, looking forward to an example :)
>
> I'll try to come up with one, although I doubt the code itself will be very interesting in this particular case. Not sure what you mean by "they are still doing whole-objectStore locking". The point of dynamic transactions is that they *don't* lock the whole store, but instead have the freedom to choose the granularity (e.g. you could do row-level locking).

My understanding is that the currently specced dynamic transactions
are still whole-objectStore. Once you call openObjectStore and
successfully receive the objectStore through the 'success' event, a
lock is held on the whole objectStore until the transaction is
committed. No other transaction, dynamic or static, can open the
objectStore in the meantime.

I base this on the sentence: "There MAY not be any overlap among the
scopes of all open connections to a given database" from the spec.

But I might be misunderstanding things entirely.

Nikunj, could you clarify how locking works for the dynamic
transactions proposal that is in the spec draft right now?

> As for deadlocks, whenever you're doing an operation you need to be ready to handle errors (out of disk, timeout, etc.). I'm not sure why deadlocks are different. If the underlying implementation has deadlock detection then you may get a specific error, otherwise you'll just get a timeout.

Well, I agree that while you have to handle errors to prevent
dataloss, I suspect that most authors won't. Thus the more error
conditions that we introduce, the more

I think the difference is that deadlocks will happen often enough that
they are a real concern. Out of disk space makes most desktop
applications freak out enough that they generally cause dataloss, thus
OSs tend to warn when you're running low on disk space.

As for timeouts, I think we should make the defaults not be to have a
timeout. Only if authors specifically specify a timeout parameter
should we use one.

My main line of thinking is that authors are going to generally be
very bad at even looking for errors. Even less so at successfully
handling those errors in a way that is satisfactory for the user. So I
think the default behavior is that any time an error occurs, we'll end
up rolling back the transaction and there will be dataloss.

We should absolutely still provide good error handling opportunities
so that authors can at least try to deal with it. However I'm not too
optimistic that people will actually use them correctly.

>>> >>> This will likely be extra bad for transactions where no write
>>> >>> operations are done. In this case failure to call a 'commit()'
>>> >>> function won't result in any broken behavior. The transaction will
>>> >>> just sit open for a long time and eventually "rolled back", though
>>> >>> since no changes were done, the rollback is transparent, and the only
>>> >>> noticeable effect is that the application halts for a while while the
>>> >>> transaction is waiting to time out.
>>> >>>
>>> >>> I should add that the WebSQLDatabase uses automatically committing
>>> >>> transactions very similar to what we're proposing, and it seems to
>>> >>> have worked fine there.
>>> >
>>> > I find this a bit scary, although it could be that I'm permanently tainted with traditional database stuff. Typical databases follow a presumed abort protocol, where if your code is interrupted by an exception, a process crash or whatever, you can always assume transactions will be rolled back if you didn't reach an explicit call to commit. The implicit commit here takes that away, and I'm not sure how safe that is.
>>> >
>>> > For example, if I don't have proper exception handling in place, an illegal call to some other non-indexeddb related API may throw an exception causing the whole thing to unwind, at which point nothing will be pending to do in the database and thus the currently active transaction will be committed.
>>> >
>>> > Using the same line of thought we used for READ_ONLY, forgetting to call commit() is easy to detect the first time you try out your code. Your changes will simply not stick. It's not as clear as the READ_ONLY example because there is no opportunity to throw an explicit exception with an explanation, but the data not being around will certainly prompt developers to look for the issue :)
>
>>> Ah, I see where we are differing in thinking. My main concern has been
>>> that of rollbacks, and associated dataloss, in the non-error case. For
>>> example people forget to call commit() in some branch of their code,
>>> thus causing dataloss when the transaction is rolled back.
>>>
>>> Your concern seems to be that of lack of rollback in the error case,
>>> for example when an exception is thrown and not caught somewhere in
>>> the code. In this case you'd want to have the transaction rolled back.
>>>
>>> One way to handle this is to try to detect unhandled errors and
>>> implicitly roll back the transaction. Two situations where we could do
>>> this is:
>>> 1. When an 'error' event is fired, but where .preventDefault() has is
>>> not called by any handler. The result is that if an error is ever
>>> fired, but no one explicitly handles it, we roll back the transaction.
>>> See also below.
>>> 2. When a success handler is called, but the handler throws an exception.
>>>
>>> The second is a bit of a problem from a spec point of view. I'm not
>>> sure it is allowed by the DOM Events spec, or by all existing DOM
>>> Events implementations. I do still think we can pull it off though.
>>> This is something I've been thinking about raising for a while, but I
>>> wanted to nail down the raised issues first.
>>>
>>> Would you feel more comfortable with implicit commits if we did the above?
>
> It does make it better, although this seems to introduce quite moving parts to the process. I still think an explicit commit() would be better, but I'm open to explore more options.
>
>>> >>> And as you say, you still usually need error callbacks. In fact, we
>>> >>> have found while writing examples using our implementation, that you
>>> >>> almost always want to add a generic error handler. It's very easy to
>>> >>> make a mistake, and if you don't add error handlers then these just go
>>> >>> by silently, offering no help as to why your program isn't working.
>>> >>> Though possibly a better implementation could put information in the
>>> >>> developer console if it detected that an error event was fired but
>>> >>> no-one was listening.
>>> >
>>> > Somewhat unrelated, but I wonder if we should consider a global (per database session) error handler or something like that. Database operations are just too granular, so maybe the usual deal where you setup an error handler per-operation is not the right thing to do.
>>>
>>> This is a great idea. What we could do is to first fire the 'error'
>>> event on the IDBRequest. If .preventDefault() is not called, we'll
>>> fire an 'error' event on the IDBTransaction. If .preventDefault()
>>> still hasn't been called, we fire an 'error' event on the IDBDatabase.
>>> If .preventDefault() *still* hasn't been called, we roll back the
>>> transaction.
>>>
>>> This is very similar to error handling in Workers. It might be overly
>>> complex implementation wise, but it does seem like a nice API for
>>> authors.
>
> I haven't thought about the implementation implications of this, but I like it from the API perspective. I can see the 80% case being that you setup a page-wide handler to handle a bit of cleanup and don't need per-operation error handlers.

Yup, that's exactly my thinking. With the proposal above authors are
free to do error handling at any level they want.

>>> Is there anything you need that isn't on the IDBTransaction object?
>>> Depending on what information it is that is needed, maybe we can
>>> simply add more things to IDBTransaction.
>>>
>>> In any case, we already have the IDBTransaction.db property, which is
>>> the pointer to the database from the transaction that you are
>>> suggesting. Would that be enough?
>
> This might be just enough. Let's call it good for now, until people write some real code and see if we missed anything.

Sounds good :)

>>> >>> However I'd love to hear about when temporary objectStores are used. I
>>> >>> know I've seen them in evaluation strategies created by SQL databases,
>>> >>> but I wasn't able to come up with an example off the top of my head
>>> >>> for a use case that would require them.
>>> >
>>> > Yes, it tends to be SQL-ish scenarios, even if SQL is not involved. For example, if you need to sort a large set you may want to sort it in blocks and keep the blocks on this. Note that this is not just about running out of memory, but also about not getting even close to it and make the rest of the system slow down because of lack of resources.
>>>
>>> Can you clarify what you mean by "sort it in blocks and keep the
>>> blocks on this". Not sure it that sentence got cut off?
>
> Sorry for the non-sense. Second try: "if you need to sort a large set you may want to sort it in blocks and keep the blocks in storage for a later merge" (merge sort spilling to disk).

Ok. I guess you could even do the sorting by inserting all values into
a temporary objectStore which is keyed on the value you want to sort
on.

I'm fine with adding temporary objectStores. Curious what others think.

/ Jonas

Received on Thursday, 15 July 2010 01:38:52 UTC