RE: [IndexedDB] Cursors and modifications from Pablo Castro on 2010-07-02 (public-webapps@w3.org from July to September 2010)

From: Pablo Castro <Pablo.Castro@microsoft.com>
Date: Fri, 2 Jul 2010 23:40:15 +0000
To: Jonas Sicking <jonas@sicking.cc>, Webapps WG <public-webapps@w3.org>
Message-ID: <F753B2C401114141B426DB383C8885E058EA13F0@TK5EX14MBXC124.redmond.corp.microsoft.>
From: public-webapps-request@w3.org [mailto:public-webapps-request@w3.org] On Behalf Of Jonas Sicking
Sent: Friday, July 02, 2010 4:00 PM

>> We ran into an complicated issue while implementing IndexedDB. In short, what should happen if an object store is modified while a cursor is iterating it? >> Note that the modification can be done within the same transaction, so the read/write locks preventing several transactions from accessing the same table isn't helping here.
>>
>> Detailed problem description (this assumes the API proposed by mozilla):
>>
>> Consider a objectStore "words" containing the following objects:
>> { name: "alpha" }
>> { name: "bravo" }
>> { name: "charlie" }
>> { name: "delta" }
>>
>> and the following program (db is a previously opened IDBDatabase):
>>
>> var trans = db.transaction(["words"], READ_WRITE); var cursor; var result = []; trans.objectStore("words").openCursor().onsuccess = function(e) {
>>   cursor = e.result;
>>   result.push(cursor.value);
>>   cursor.continue();
>> }
>> trans.objectStore("words").get("delta").onsuccess = function(e) {
>>   trans.objectStore("words").put({ name: "delta", myModifiedValue: 17 }); }
>>
>> When the cursor reads the "delta" entry, will it see the 'myModifiedValue' property? Since we so far has defined that the callback order is defined to be >> the request order, that means that put request will be finished before the "delta" entry is iterated by the cursor.
>>
>> The problem is even more serious with cursors that iterate indexes.
>> Here a modification can even affect the position of the currently iterated object in the index, and the modification can (if i'm reading the spec correctly) >> come from the cursor itself.
>>
>> Consider the following objectStore "people" with keyPath "name"
>> containing the following objects:
>>
>> { name: "Adam", count: 30 }
>> { name: "Bertil", count: 31 }
>> { name: "Cesar", count: 32 }
>> { name: "David", count: 33 }
>> { name: "Erik", count: 35 }
>>
>> and an index "countIndex" with keyPath "count". What would the following code do?
>>
>> results = [];
>> db.objectStore("people",
>> READ_WRITE).index("countIndex").openObjectCursor().onsuccess = function (e) {
>>   cursor = e.result;
>>   if (!cursor) {
>>     alert(results);
>>     return;
>>   }
>>   if (cursor.value.name == "Bertil") {
>>     cursor.update({name: "Bertil", count: 34 });
>>   }
>>   results.push(cursor.value.name);
>>   cursor.continue();
>> };
>>
>> What does this alert? Would it alert "Adam,Bertil,Erik" as the cursor would stay on the "Bertil" object as it is moved in the index? Or would it alert "Adam,Bertil,Cesar,David,Bertil,Erik" as we would iterate "Bertil" again at its new position in the index?

My first reaction is that both from the expected behavior of perspective (transaction is the scope of isolation) and from the implementation perspective it would be better to see live changes if they happened in the same transaction as the cursor (over a store or index). So in your example you would iterate one of the rows twice. Maintaining order and membership stable would mean creating another scope of isolation within the transaction, which to me would be unusual and it would be probably quite painful to implement without spilling a copy of the records to disk (at least a copy of the keys/order if you don't care about protecting from changes that don't affect membership/order; some databases call these keyset cursors).

>>
>> We could say that cursors always iterate snapshots, however this introduces MVCC. Though it seems to me that SNAPSHOT_READ already does that.

Actually, even with MVCC you'd see your own changes, because they happen in the same transaction so the buffer pool will use the same version of the page. While it may be possible to reuse the MVCC infrastructure, it would still require the introduction of a second scope for stability. 

>>
>> We could also say that cursors iterate live data though that can be pretty confusing and forces the implementation to deal with entries being added and >> removed during iteration, and it'd be tricky to define all edge cases.

Would this be any different from the implementation perspective than dealing with changes that happen through other transactions once they are committed? Typically at least in non-MVCC systems committed changes that are "further ahead" in a cursor scan end up showing up even when the cursor was opened before the other transaction committed.

>>
>> It's certainly debatable how much of a problem any of these edgecases are for users. Note that all of this is only an issue if you modify and read from the >> same records *in the same transaction*. I can't think of a case where it isn't trivial to avoid these problems by separating things into separate transactions. >> However it'd be nice to avoid creating foot-guns for people to play with (think of the children!).
>>
>> However we still need to define *something*. I would suggest that we define that cursors iterate snapshots. It seems the cleanest for users and easiest >> to define. And once implementations add MVCC support it should be easy to implement. I think we've come up with a decent plan for how to do implement it in sqlite even without proper MVCC, so it should be doable even then.

Besides the expectations aspects, I worry that doing this means that opening a cursor means incurring in substantial cost for all cases (e.g. creating a keyset or something like that).

-pablo
Received on Friday, 2 July 2010 23:40:53 UTC