- From: Jonas Sicking <jonas@sicking.cc>
- Date: Fri, 2 Jul 2010 20:08:18 -0700
- To: Jeremy Orlow <jorlow@chromium.org>
- Cc: Andrei Popescu <andreip@google.com>, Pablo Castro <Pablo.Castro@microsoft.com>, Webapps WG <public-webapps@w3.org>
On Fri, Jul 2, 2010 at 7:27 PM, Jeremy Orlow <jorlow@chromium.org> wrote: > On Sat, Jul 3, 2010 at 11:09 AM, Jonas Sicking <jonas@sicking.cc> wrote: >> >> On Fri, Jul 2, 2010 at 5:44 PM, Andrei Popescu <andreip@google.com> wrote: >> > On Sat, Jul 3, 2010 at 1:14 AM, Jonas Sicking <jonas@sicking.cc> wrote: >> >> On Fri, Jul 2, 2010 at 4:40 PM, Pablo Castro >> >> <Pablo.Castro@microsoft.com> wrote: >> >>> >> >>> From: public-webapps-request@w3.org >> >>> [mailto:public-webapps-request@w3.org] On Behalf Of Jonas Sicking >> >>> Sent: Friday, July 02, 2010 4:00 PM >> >>> >> >>>>> We ran into an complicated issue while implementing IndexedDB. In >> >>>>> short, what should happen if an object store is modified while a cursor is >> >>>>> iterating it? >> Note that the modification can be done within the same >> >>>>> transaction, so the read/write locks preventing several transactions from >> >>>>> accessing the same table isn't helping here. >> >>>>> >> >>>>> Detailed problem description (this assumes the API proposed by >> >>>>> mozilla): >> >>>>> >> >>>>> Consider a objectStore "words" containing the following objects: >> >>>>> { name: "alpha" } >> >>>>> { name: "bravo" } >> >>>>> { name: "charlie" } >> >>>>> { name: "delta" } >> >>>>> >> >>>>> and the following program (db is a previously opened IDBDatabase): >> >>>>> >> >>>>> var trans = db.transaction(["words"], READ_WRITE); var cursor; var >> >>>>> result = []; trans.objectStore("words").openCursor().onsuccess = function(e) >> >>>>> { >> >>>>> cursor = e.result; >> >>>>> result.push(cursor.value); >> >>>>> cursor.continue(); >> >>>>> } >> >>>>> trans.objectStore("words").get("delta").onsuccess = function(e) { >> >>>>> trans.objectStore("words").put({ name: "delta", myModifiedValue: >> >>>>> 17 }); } >> >>>>> >> >>>>> When the cursor reads the "delta" entry, will it see the >> >>>>> 'myModifiedValue' property? Since we so far has defined that the callback >> >>>>> order is defined to be >> the request order, that means that put request >> >>>>> will be finished before the "delta" entry is iterated by the cursor. >> >>>>> >> >>>>> The problem is even more serious with cursors that iterate indexes. >> >>>>> Here a modification can even affect the position of the currently >> >>>>> iterated object in the index, and the modification can (if i'm reading the >> >>>>> spec correctly) >> come from the cursor itself. >> >>>>> >> >>>>> Consider the following objectStore "people" with keyPath "name" >> >>>>> containing the following objects: >> >>>>> >> >>>>> { name: "Adam", count: 30 } >> >>>>> { name: "Bertil", count: 31 } >> >>>>> { name: "Cesar", count: 32 } >> >>>>> { name: "David", count: 33 } >> >>>>> { name: "Erik", count: 35 } >> >>>>> >> >>>>> and an index "countIndex" with keyPath "count". What would the >> >>>>> following code do? >> >>>>> >> >>>>> results = []; >> >>>>> db.objectStore("people", >> >>>>> READ_WRITE).index("countIndex").openObjectCursor().onsuccess = >> >>>>> function (e) { >> >>>>> cursor = e.result; >> >>>>> if (!cursor) { >> >>>>> alert(results); >> >>>>> return; >> >>>>> } >> >>>>> if (cursor.value.name == "Bertil") { >> >>>>> cursor.update({name: "Bertil", count: 34 }); >> >>>>> } >> >>>>> results.push(cursor.value.name); >> >>>>> cursor.continue(); >> >>>>> }; >> >>>>> >> >>>>> What does this alert? Would it alert "Adam,Bertil,Erik" as the >> >>>>> cursor would stay on the "Bertil" object as it is moved in the index? Or >> >>>>> would it alert "Adam,Bertil,Cesar,David,Bertil,Erik" as we would iterate >> >>>>> "Bertil" again at its new position in the index? >> >>> >> >>> My first reaction is that both from the expected behavior of >> >>> perspective (transaction is the scope of isolation) and from the >> >>> implementation perspective it would be better to see live changes if they >> >>> happened in the same transaction as the cursor (over a store or index). So >> >>> in your example you would iterate one of the rows twice. Maintaining order >> >>> and membership stable would mean creating another scope of isolation within >> >>> the transaction, which to me would be unusual and it would be probably quite >> >>> painful to implement without spilling a copy of the records to disk (at >> >>> least a copy of the keys/order if you don't care about protecting from >> >>> changes that don't affect membership/order; some databases call these keyset >> >>> cursors). >> >>> >> >>>>> >> >>>>> We could say that cursors always iterate snapshots, however this >> >>>>> introduces MVCC. Though it seems to me that SNAPSHOT_READ already does that. >> >>> >> >>> Actually, even with MVCC you'd see your own changes, because they >> >>> happen in the same transaction so the buffer pool will use the same version >> >>> of the page. While it may be possible to reuse the MVCC infrastructure, it >> >>> would still require the introduction of a second scope for stability. >> >> >> >> It's quite implementable using append-only b-trees. Though it might be >> >> much to ask that implementations are forced to use that. >> >> >> >> An alternative to what I suggested earlier is that all read operations >> >> use "read committed". I.e. they always see the data as it looked at >> >> the beginning of the transaction. Would this be more compatible with >> >> existing MVCC implementations? >> >> >> > >> > Hmm, so if you modified the object store and then, later in the same >> > transaction, used a cursor to iterate the object store, the cursor >> > would not see the earlier modifications? That's not very intiutive to >> > me...or did I misunderstand? >> >> If we go with "read committed" then yes, your understanding is correct. >> >> Out of curiosity, how are you feeling about the "cursors iterate data >> as it looked when cursor was opened" solution? > > I'm actually starting to warm up to the idea some, but it'd need to be a > non-default option. And I'm on the fence about whether it's a v1 option. > >> >> >> I'd imagine this should be as easy to implement as SNAPSHOT_READ. >> >> >> >>>>> We could also say that cursors iterate live data though that can be >> >>>>> pretty confusing and forces the implementation to deal with entries being >> >>>>> added and >> removed during iteration, and it'd be tricky to define all edge >> >>>>> cases. >> >>> >> >>> Would this be any different from the implementation perspective than >> >>> dealing with changes that happen through other transactions once they are >> >>> committed? Typically at least in non-MVCC systems committed changes that are >> >>> "further ahead" in a cursor scan end up showing up even when the cursor was >> >>> opened before the other transaction committed. >> >> >> >> IndexedDB doesn't allow write transactions to a given store to start >> >> while there are read transactions using that store, so that doesn't >> >> seem to be a problem. Unless I'm misunderstanding something? >> >> >> > >> > That was my understanding too: while a cursor was open, you can't >> > allow a write transaction to start. >> >> Exactly. Though "while a read transaction was open, you can't allow a >> write transaction to start" is even more correct. > > Sure you can't let a write static transaction start, but I don't see why you > can't let a dynamic transaction start. Sure there are many cases where it > can fail, but I don't see any reason it should be illegal for dynamic > transactions to execute optimistically. I mean, any developer that uses > them will have to handle the error case anyway. I think this thread has gone way off track. The point is that you can't have a read transaction being affected by another write transaction. That is in fact the sole purpose of a read transaction. If someone disagrees then please let me know. >> >>>>> It's certainly debatable how much of a problem any of these >> >>>>> edgecases are for users. Note that all of this is only an issue if you >> >>>>> modify and read from the >> same records *in the same transaction*. I can't >> >>>>> think of a case where it isn't trivial to avoid these problems by separating >> >>>>> things into separate transactions. >> >>>>> However it'd be nice to avoid creating foot-guns for people to play >> >>>>> with (think of the children!). >> >>>>> >> >>>>> However we still need to define *something*. I would suggest that we >> >>>>> define that cursors iterate snapshots. It seems the cleanest for users and >> >>>>> easiest >> to define. And once implementations add MVCC support it should be >> >>>>> easy to implement. I think we've come up with a decent plan for how to do >> >>>>> implement it in sqlite even without proper MVCC, so it should be doable even >> >>>>> then. >> >>> >> >>> Besides the expectations aspects, I worry that doing this means that >> >>> opening a cursor means incurring in substantial cost for all cases (e.g. >> >>> creating a keyset or something like that). >> >> >> >> I agree we definitely don't want that. We are working on an >> >> implementation which is backed by a SQL database and completely >> >> incapable of MVCC, so it seems doable. However I don't yet know how >> >> much of a complexity and performance penalty that carries. >> >> >> > >> > On the other hand, as you said earlier, maybe allowing the live >> > changes to be visible in the cursor is not such a big problem as apps >> > can work around these edge cases? >> >> I suspect so. Though given that behavior can be very unintuitive, > > I would argue that it's highly logical and trying to make it "intuitive" is > a losing battle because it introduces tons of special cases that no one > remembers which work most of the time, but blow up horribly in to hard to > debug messes the other small fraction of the time. Similar to the way Perl > tries to be "intuitive". Keeping it straight forward and logical seems like > a much better approach. I certainly agree that there is no perfectly logical behavior here. All solutions have their quirks and unexpected behavior. I do however think that snapshot behavior I originally proposed is much easier to explain and thus understand. I.e. defining all edge cases for 'live' iterators will take a lot more spec text and result in a lot more edge cases. However I'm tentatively fine with doing this if that is what everyone else wants to do. I know others at mozilla are not as excited about this solution though. We'll definitely need to look at implementation strategies. >> we >> would have to define things to a very high level of detail to ensure >> that the same unintuitive behavior is happening in all >> implementations. Additionally, I suspect implementation can get quite >> complex in order to deal with all edge cases, for example dealing with >> data that was read ahead optimistically. >> >> We strategy for defining this is to define all mutating operations in >> terms of which ones are insertions vs. updates vs. removals. Both in >> the objectStore and in indexes. For example, does a IDBObjectStore.put >> call that modifies an existing entry cause corresponding entries in >> indexes to be updated or removed and reinserted? Both in the situation >> when the put call causes index values to be changed and not changed. >> >> Then define how active cursors move around when an entry that they are >> currently on is removed, or when entries before and after the current >> entry are inserted. For all types of cursors; forwards, backwards and >> no_duplicate. Also need to define this in situations when the cursor >> is currently firmly in one position, or when a .continue() call has >> been made, but the callback not yet has fired. Likewise when the >> cursor has not yet fired its first callback and entries are inserted >> in the beginning, and when a cursor has notified about the last entry, >> and entries are added after the end. >> >> It's certainly possible, but no small feat. > > I think this is what we need to do. I volunteer Andrei. :-) Sold! :-) / Jonas
Received on Saturday, 3 July 2010 03:09:11 UTC