Re: [IndexedDB] Cursors and modifications

On Fri, Jul 2, 2010 at 7:27 PM, Jeremy Orlow <jorlow@chromium.org> wrote:
> On Sat, Jul 3, 2010 at 11:09 AM, Jonas Sicking <jonas@sicking.cc> wrote:
>>
>> On Fri, Jul 2, 2010 at 5:44 PM, Andrei Popescu <andreip@google.com> wrote:
>> > On Sat, Jul 3, 2010 at 1:14 AM, Jonas Sicking <jonas@sicking.cc> wrote:
>> >> On Fri, Jul 2, 2010 at 4:40 PM, Pablo Castro
>> >> <Pablo.Castro@microsoft.com> wrote:
>> >>>
>> >>> From: public-webapps-request@w3.org
>> >>> [mailto:public-webapps-request@w3.org] On Behalf Of Jonas Sicking
>> >>> Sent: Friday, July 02, 2010 4:00 PM
>> >>>
>> >>>>> We ran into an complicated issue while implementing IndexedDB. In
>> >>>>> short, what should happen if an object store is modified while a cursor is
>> >>>>> iterating it? >> Note that the modification can be done within the same
>> >>>>> transaction, so the read/write locks preventing several transactions from
>> >>>>> accessing the same table isn't helping here.
>> >>>>>
>> >>>>> Detailed problem description (this assumes the API proposed by
>> >>>>> mozilla):
>> >>>>>
>> >>>>> Consider a objectStore "words" containing the following objects:
>> >>>>> { name: "alpha" }
>> >>>>> { name: "bravo" }
>> >>>>> { name: "charlie" }
>> >>>>> { name: "delta" }
>> >>>>>
>> >>>>> and the following program (db is a previously opened IDBDatabase):
>> >>>>>
>> >>>>> var trans = db.transaction(["words"], READ_WRITE); var cursor; var
>> >>>>> result = []; trans.objectStore("words").openCursor().onsuccess = function(e)
>> >>>>> {
>> >>>>>   cursor = e.result;
>> >>>>>   result.push(cursor.value);
>> >>>>>   cursor.continue();
>> >>>>> }
>> >>>>> trans.objectStore("words").get("delta").onsuccess = function(e) {
>> >>>>>   trans.objectStore("words").put({ name: "delta", myModifiedValue:
>> >>>>> 17 }); }
>> >>>>>
>> >>>>> When the cursor reads the "delta" entry, will it see the
>> >>>>> 'myModifiedValue' property? Since we so far has defined that the callback
>> >>>>> order is defined to be >> the request order, that means that put request
>> >>>>> will be finished before the "delta" entry is iterated by the cursor.
>> >>>>>
>> >>>>> The problem is even more serious with cursors that iterate indexes.
>> >>>>> Here a modification can even affect the position of the currently
>> >>>>> iterated object in the index, and the modification can (if i'm reading the
>> >>>>> spec correctly) >> come from the cursor itself.
>> >>>>>
>> >>>>> Consider the following objectStore "people" with keyPath "name"
>> >>>>> containing the following objects:
>> >>>>>
>> >>>>> { name: "Adam", count: 30 }
>> >>>>> { name: "Bertil", count: 31 }
>> >>>>> { name: "Cesar", count: 32 }
>> >>>>> { name: "David", count: 33 }
>> >>>>> { name: "Erik", count: 35 }
>> >>>>>
>> >>>>> and an index "countIndex" with keyPath "count". What would the
>> >>>>> following code do?
>> >>>>>
>> >>>>> results = [];
>> >>>>> db.objectStore("people",
>> >>>>> READ_WRITE).index("countIndex").openObjectCursor().onsuccess =
>> >>>>> function (e) {
>> >>>>>   cursor = e.result;
>> >>>>>   if (!cursor) {
>> >>>>>     alert(results);
>> >>>>>     return;
>> >>>>>   }
>> >>>>>   if (cursor.value.name == "Bertil") {
>> >>>>>     cursor.update({name: "Bertil", count: 34 });
>> >>>>>   }
>> >>>>>   results.push(cursor.value.name);
>> >>>>>   cursor.continue();
>> >>>>> };
>> >>>>>
>> >>>>> What does this alert? Would it alert "Adam,Bertil,Erik" as the
>> >>>>> cursor would stay on the "Bertil" object as it is moved in the index? Or
>> >>>>> would it alert "Adam,Bertil,Cesar,David,Bertil,Erik" as we would iterate
>> >>>>> "Bertil" again at its new position in the index?
>> >>>
>> >>> My first reaction is that both from the expected behavior of
>> >>> perspective (transaction is the scope of isolation) and from the
>> >>> implementation perspective it would be better to see live changes if they
>> >>> happened in the same transaction as the cursor (over a store or index). So
>> >>> in your example you would iterate one of the rows twice. Maintaining order
>> >>> and membership stable would mean creating another scope of isolation within
>> >>> the transaction, which to me would be unusual and it would be probably quite
>> >>> painful to implement without spilling a copy of the records to disk (at
>> >>> least a copy of the keys/order if you don't care about protecting from
>> >>> changes that don't affect membership/order; some databases call these keyset
>> >>> cursors).
>> >>>
>> >>>>>
>> >>>>> We could say that cursors always iterate snapshots, however this
>> >>>>> introduces MVCC. Though it seems to me that SNAPSHOT_READ already does that.
>> >>>
>> >>> Actually, even with MVCC you'd see your own changes, because they
>> >>> happen in the same transaction so the buffer pool will use the same version
>> >>> of the page. While it may be possible to reuse the MVCC infrastructure, it
>> >>> would still require the introduction of a second scope for stability.
>> >>
>> >> It's quite implementable using append-only b-trees. Though it might be
>> >> much to ask that implementations are forced to use that.
>> >>
>> >> An alternative to what I suggested earlier is that all read operations
>> >> use "read committed". I.e. they always see the data as it looked at
>> >> the beginning of the transaction. Would this be more compatible with
>> >> existing MVCC implementations?
>> >>
>> >
>> > Hmm, so if you modified the object store and then, later in the same
>> > transaction, used a cursor to iterate the object store, the cursor
>> > would not see the earlier modifications? That's not very intiutive to
>> > me...or did I misunderstand?
>>
>> If we go with "read committed" then yes, your understanding is correct.
>>
>> Out of curiosity, how are you feeling about the "cursors iterate data
>> as it looked when cursor was opened" solution?
>
> I'm actually starting to warm up to the idea some, but it'd need to be a
> non-default option.  And I'm on the fence about whether it's a v1 option.
>
>>
>> >> I'd imagine this should be as easy to implement as SNAPSHOT_READ.
>> >>
>> >>>>> We could also say that cursors iterate live data though that can be
>> >>>>> pretty confusing and forces the implementation to deal with entries being
>> >>>>> added and >> removed during iteration, and it'd be tricky to define all edge
>> >>>>> cases.
>> >>>
>> >>> Would this be any different from the implementation perspective than
>> >>> dealing with changes that happen through other transactions once they are
>> >>> committed? Typically at least in non-MVCC systems committed changes that are
>> >>> "further ahead" in a cursor scan end up showing up even when the cursor was
>> >>> opened before the other transaction committed.
>> >>
>> >> IndexedDB doesn't allow write transactions to a given store to start
>> >> while there are read transactions using that store, so that doesn't
>> >> seem to be a problem. Unless I'm misunderstanding something?
>> >>
>> >
>> > That was my understanding too: while a cursor was open, you can't
>> > allow a write transaction to start.
>>
>> Exactly. Though "while a read transaction was open, you can't allow a
>> write transaction to start" is even more correct.
>
> Sure you can't let a write static transaction start, but I don't see why you
> can't let a dynamic transaction start.  Sure there are many cases where it
> can fail, but I don't see any reason it should be illegal for dynamic
> transactions to execute optimistically.  I mean, any developer that uses
> them will have to handle the error case anyway.

I think this thread has gone way off track. The point is that you
can't have a read transaction being affected by another write
transaction. That is in fact the sole purpose of a read transaction.
If someone disagrees then please let me know.

>> >>>>> It's certainly debatable how much of a problem any of these
>> >>>>> edgecases are for users. Note that all of this is only an issue if you
>> >>>>> modify and read from the >> same records *in the same transaction*. I can't
>> >>>>> think of a case where it isn't trivial to avoid these problems by separating
>> >>>>> things into separate transactions.
>> >>>>> However it'd be nice to avoid creating foot-guns for people to play
>> >>>>> with (think of the children!).
>> >>>>>
>> >>>>> However we still need to define *something*. I would suggest that we
>> >>>>> define that cursors iterate snapshots. It seems the cleanest for users and
>> >>>>> easiest >> to define. And once implementations add MVCC support it should be
>> >>>>> easy to implement. I think we've come up with a decent plan for how to do
>> >>>>> implement it in sqlite even without proper MVCC, so it should be doable even
>> >>>>> then.
>> >>>
>> >>> Besides the expectations aspects, I worry that doing this means that
>> >>> opening a cursor means incurring in substantial cost for all cases (e.g.
>> >>> creating a keyset or something like that).
>> >>
>> >> I agree we definitely don't want that. We are working on an
>> >> implementation which is backed by a SQL database and completely
>> >> incapable of MVCC, so it seems doable. However I don't yet know how
>> >> much of a complexity and performance penalty that carries.
>> >>
>> >
>> > On the other hand, as you said earlier, maybe allowing the live
>> > changes to be visible in the cursor is not such a big problem as apps
>> > can work around these edge cases?
>>
>> I suspect so. Though given that behavior can be very unintuitive,
>
> I would argue that it's highly logical and trying to make it "intuitive" is
> a losing battle because it introduces tons of special cases that no one
> remembers which work most of the time, but blow up horribly in to hard to
> debug messes the other small fraction of the time.  Similar to the way Perl
> tries to be "intuitive".  Keeping it straight forward and logical seems like
> a much better approach.

I certainly agree that there is no perfectly logical behavior here.
All solutions have their quirks and unexpected behavior. I do however
think that snapshot behavior I originally proposed is much easier to
explain and thus understand. I.e. defining all edge cases for 'live'
iterators will take a lot more spec text and result in a lot more edge
cases.

However I'm tentatively fine with doing this if that is what everyone
else wants to do. I know others at mozilla are not as excited about
this solution though. We'll definitely need to look at implementation
strategies.

>> we
>> would have to define things to a very high level of detail to ensure
>> that the same unintuitive behavior is happening in all
>> implementations. Additionally, I suspect implementation can get quite
>> complex in order to deal with all edge cases, for example dealing with
>> data that was read ahead optimistically.
>>
>> We strategy for defining this is to define all mutating operations in
>> terms of which ones are insertions vs. updates vs. removals. Both in
>> the objectStore and in indexes. For example, does a IDBObjectStore.put
>> call that modifies an existing entry cause corresponding entries in
>> indexes to be updated or removed and reinserted? Both in the situation
>> when the put call causes index values to be changed and not changed.
>>
>> Then define how active cursors move around when an entry that they are
>> currently on is removed, or when entries before and after the current
>> entry are inserted. For all types of cursors; forwards, backwards and
>> no_duplicate. Also need to define this in situations when the cursor
>> is currently firmly in one position, or when a .continue() call has
>> been made, but the callback not yet has fired. Likewise when the
>> cursor has not yet fired its first callback and entries are inserted
>> in the beginning, and when a cursor has notified about the last entry,
>> and entries are added after the end.
>>
>> It's certainly possible, but no small feat.
>
> I think this is what we need to do.  I volunteer Andrei.  :-)

Sold! :-)

/ Jonas

Received on Saturday, 3 July 2010 03:09:11 UTC