- From: Jeremy Orlow <jorlow@chromium.org>
- Date: Wed, 14 Jul 2010 17:27:07 +0100
- To: Jonas Sicking <jonas@sicking.cc>
- Cc: Andrei Popescu <andreip@google.com>, Pablo Castro <Pablo.Castro@microsoft.com>, Webapps WG <public-webapps@w3.org>
- Message-ID: <AANLkTinCdxW3ACuG7oGMFK6rdEMc2_3K2Rn7PgRzfFMu@mail.gmail.com>
On Wed, Jul 14, 2010 at 5:17 PM, Jonas Sicking <jonas@sicking.cc> wrote: > On Wed, Jul 14, 2010 at 5:12 AM, Jeremy Orlow <jorlow@chromium.org> wrote: > > On Thu, Jul 8, 2010 at 8:42 PM, Jonas Sicking <jonas@sicking.cc> wrote: > >> > >> On Mon, Jul 5, 2010 at 9:45 AM, Andrei Popescu <andreip@google.com> > wrote: > >> > On Sat, Jul 3, 2010 at 2:09 AM, Jonas Sicking <jonas@sicking.cc> > wrote: > >> >> On Fri, Jul 2, 2010 at 5:44 PM, Andrei Popescu <andreip@google.com> > >> >> wrote: > >> >>> On Sat, Jul 3, 2010 at 1:14 AM, Jonas Sicking <jonas@sicking.cc> > >> >>> wrote: > >> >>>> On Fri, Jul 2, 2010 at 4:40 PM, Pablo Castro > >> >>>> <Pablo.Castro@microsoft.com> wrote: > >> >>>>> > >> >>>>> From: public-webapps-request@w3.org > >> >>>>> [mailto:public-webapps-request@w3.org] On Behalf Of Jonas Sicking > >> >>>>> Sent: Friday, July 02, 2010 4:00 PM > >> >>>>> > >> >>>>>>> We ran into an complicated issue while implementing IndexedDB. > In > >> >>>>>>> short, what should happen if an object store is modified while a > cursor is > >> >>>>>>> iterating it? >> Note that the modification can be done within > the same > >> >>>>>>> transaction, so the read/write locks preventing several > transactions from > >> >>>>>>> accessing the same table isn't helping here. > >> >>>>>>> > >> >>>>>>> Detailed problem description (this assumes the API proposed by > >> >>>>>>> mozilla): > >> >>>>>>> > >> >>>>>>> Consider a objectStore "words" containing the following objects: > >> >>>>>>> { name: "alpha" } > >> >>>>>>> { name: "bravo" } > >> >>>>>>> { name: "charlie" } > >> >>>>>>> { name: "delta" } > >> >>>>>>> > >> >>>>>>> and the following program (db is a previously opened > IDBDatabase): > >> >>>>>>> > >> >>>>>>> var trans = db.transaction(["words"], READ_WRITE); var cursor; > var > >> >>>>>>> result = []; trans.objectStore("words").openCursor().onsuccess = > function(e) > >> >>>>>>> { > >> >>>>>>> cursor = e.result; > >> >>>>>>> result.push(cursor.value); > >> >>>>>>> cursor.continue(); > >> >>>>>>> } > >> >>>>>>> trans.objectStore("words").get("delta").onsuccess = function(e) > { > >> >>>>>>> trans.objectStore("words").put({ name: "delta", > myModifiedValue: > >> >>>>>>> 17 }); } > >> >>>>>>> > >> >>>>>>> When the cursor reads the "delta" entry, will it see the > >> >>>>>>> 'myModifiedValue' property? Since we so far has defined that the > callback > >> >>>>>>> order is defined to be >> the request order, that means that put > request > >> >>>>>>> will be finished before the "delta" entry is iterated by the > cursor. > >> >>>>>>> > >> >>>>>>> The problem is even more serious with cursors that iterate > >> >>>>>>> indexes. > >> >>>>>>> Here a modification can even affect the position of the > currently > >> >>>>>>> iterated object in the index, and the modification can (if i'm > reading the > >> >>>>>>> spec correctly) >> come from the cursor itself. > >> >>>>>>> > >> >>>>>>> Consider the following objectStore "people" with keyPath "name" > >> >>>>>>> containing the following objects: > >> >>>>>>> > >> >>>>>>> { name: "Adam", count: 30 } > >> >>>>>>> { name: "Bertil", count: 31 } > >> >>>>>>> { name: "Cesar", count: 32 } > >> >>>>>>> { name: "David", count: 33 } > >> >>>>>>> { name: "Erik", count: 35 } > >> >>>>>>> > >> >>>>>>> and an index "countIndex" with keyPath "count". What would the > >> >>>>>>> following code do? > >> >>>>>>> > >> >>>>>>> results = []; > >> >>>>>>> db.objectStore("people", > >> >>>>>>> READ_WRITE).index("countIndex").openObjectCursor().onsuccess = > >> >>>>>>> function (e) { > >> >>>>>>> cursor = e.result; > >> >>>>>>> if (!cursor) { > >> >>>>>>> alert(results); > >> >>>>>>> return; > >> >>>>>>> } > >> >>>>>>> if (cursor.value.name == "Bertil") { > >> >>>>>>> cursor.update({name: "Bertil", count: 34 }); > >> >>>>>>> } > >> >>>>>>> results.push(cursor.value.name); > >> >>>>>>> cursor.continue(); > >> >>>>>>> }; > >> >>>>>>> > >> >>>>>>> What does this alert? Would it alert "Adam,Bertil,Erik" as the > >> >>>>>>> cursor would stay on the "Bertil" object as it is moved in the > index? Or > >> >>>>>>> would it alert "Adam,Bertil,Cesar,David,Bertil,Erik" as we would > iterate > >> >>>>>>> "Bertil" again at its new position in the index? > >> >>>>> > >> >>>>> My first reaction is that both from the expected behavior of > >> >>>>> perspective (transaction is the scope of isolation) and from the > >> >>>>> implementation perspective it would be better to see live changes > if they > >> >>>>> happened in the same transaction as the cursor (over a store or > index). So > >> >>>>> in your example you would iterate one of the rows twice. > Maintaining order > >> >>>>> and membership stable would mean creating another scope of > isolation within > >> >>>>> the transaction, which to me would be unusual and it would be > probably quite > >> >>>>> painful to implement without spilling a copy of the records to > disk (at > >> >>>>> least a copy of the keys/order if you don't care about protecting > from > >> >>>>> changes that don't affect membership/order; some databases call > these keyset > >> >>>>> cursors). > >> >>>>> > >> >>>>>>> > >> >>>>>>> We could say that cursors always iterate snapshots, however this > >> >>>>>>> introduces MVCC. Though it seems to me that SNAPSHOT_READ > already does that. > >> >>>>> > >> >>>>> Actually, even with MVCC you'd see your own changes, because they > >> >>>>> happen in the same transaction so the buffer pool will use the > same version > >> >>>>> of the page. While it may be possible to reuse the MVCC > infrastructure, it > >> >>>>> would still require the introduction of a second scope for > stability. > >> >>>> > >> >>>> It's quite implementable using append-only b-trees. Though it might > >> >>>> be > >> >>>> much to ask that implementations are forced to use that. > >> >>>> > >> >>>> An alternative to what I suggested earlier is that all read > >> >>>> operations > >> >>>> use "read committed". I.e. they always see the data as it looked at > >> >>>> the beginning of the transaction. Would this be more compatible > with > >> >>>> existing MVCC implementations? > >> >>>> > >> >>> > >> >>> Hmm, so if you modified the object store and then, later in the same > >> >>> transaction, used a cursor to iterate the object store, the cursor > >> >>> would not see the earlier modifications? That's not very intiutive > to > >> >>> me...or did I misunderstand? > >> >> > >> >> If we go with "read committed" then yes, your understanding is > correct. > >> >> > >> >> Out of curiosity, how are you feeling about the "cursors iterate data > >> >> as it looked when cursor was opened" solution? > >> >> > >> > > >> > I feel that that's the easiest solution to specify although it may > >> > also be unintuitive if one calls 'put / update' and then expects to > >> > see the updated value once the cursor gets to the relevant object. My > >> > other concern was the one brought by Pablo, i.e. is it even more > >> > complex to implement another scope of isolation for the duration of > >> > the cursor? > >> > > >> >>>> I'd imagine this should be as easy to implement as SNAPSHOT_READ. > >> >>>> > >> >>>>>>> We could also say that cursors iterate live data though that can > >> >>>>>>> be pretty confusing and forces the implementation to deal with > entries being > >> >>>>>>> added and >> removed during iteration, and it'd be tricky to > define all edge > >> >>>>>>> cases. > >> >>>>> > >> >>>>> Would this be any different from the implementation perspective > than > >> >>>>> dealing with changes that happen through other transactions once > they are > >> >>>>> committed? Typically at least in non-MVCC systems committed > changes that are > >> >>>>> "further ahead" in a cursor scan end up showing up even when the > cursor was > >> >>>>> opened before the other transaction committed. > >> >>>> > >> >>>> IndexedDB doesn't allow write transactions to a given store to > start > >> >>>> while there are read transactions using that store, so that doesn't > >> >>>> seem to be a problem. Unless I'm misunderstanding something? > >> >>>> > >> >>> > >> >>> That was my understanding too: while a cursor was open, you can't > >> >>> allow a write transaction to start. > >> >> > >> >> Exactly. Though "while a read transaction was open, you can't allow a > >> >> write transaction to start" is even more correct. > >> >> > >> >>>>>>> It's certainly debatable how much of a problem any of these > >> >>>>>>> edgecases are for users. Note that all of this is only an issue > if you > >> >>>>>>> modify and read from the >> same records *in the same > transaction*. I can't > >> >>>>>>> think of a case where it isn't trivial to avoid these problems > by separating > >> >>>>>>> things into separate transactions. > >> >>>>>>> However it'd be nice to avoid creating foot-guns for people to > >> >>>>>>> play with (think of the children!). > >> >>>>>>> > >> >>>>>>> However we still need to define *something*. I would suggest > that > >> >>>>>>> we define that cursors iterate snapshots. It seems the cleanest > for users > >> >>>>>>> and easiest >> to define. And once implementations add MVCC > support it > >> >>>>>>> should be easy to implement. I think we've come up with a decent > plan for > >> >>>>>>> how to do implement it in sqlite even without proper MVCC, so it > should be > >> >>>>>>> doable even then. > >> >>>>> > >> >>>>> Besides the expectations aspects, I worry that doing this means > that > >> >>>>> opening a cursor means incurring in substantial cost for all cases > (e.g. > >> >>>>> creating a keyset or something like that). > >> >>>> > >> >>>> I agree we definitely don't want that. We are working on an > >> >>>> implementation which is backed by a SQL database and completely > >> >>>> incapable of MVCC, so it seems doable. However I don't yet know how > >> >>>> much of a complexity and performance penalty that carries. > >> >>>> > >> >>> > >> >>> On the other hand, as you said earlier, maybe allowing the live > >> >>> changes to be visible in the cursor is not such a big problem as > apps > >> >>> can work around these edge cases? > >> >> > >> >> I suspect so. Though given that behavior can be very unintuitive, we > >> >> would have to define things to a very high level of detail to ensure > >> >> that the same unintuitive behavior is happening in all > >> >> implementations. Additionally, I suspect implementation can get quite > >> >> complex in order to deal with all edge cases, for example dealing > with > >> >> data that was read ahead optimistically. > >> >> > >> >> We strategy for defining this is to define all mutating operations in > >> >> terms of which ones are insertions vs. updates vs. removals. Both in > >> >> the objectStore and in indexes. For example, does a > IDBObjectStore.put > >> >> call that modifies an existing entry cause corresponding entries in > >> >> indexes to be updated or removed and reinserted? Both in the > situation > >> >> when the put call causes index values to be changed and not changed. > >> >> > >> >> Then define how active cursors move around when an entry that they > are > >> >> currently on is removed, or when entries before and after the current > >> >> entry are inserted. For all types of cursors; forwards, backwards and > >> >> no_duplicate. Also need to define this in situations when the cursor > >> >> is currently firmly in one position, or when a .continue() call has > >> >> been made, but the callback not yet has fired. Likewise when the > >> >> cursor has not yet fired its first callback and entries are inserted > >> >> in the beginning, and when a cursor has notified about the last > entry, > >> >> and entries are added after the end. > >> >> > >> >> It's certainly possible, but no small feat. > >> >> > >> > > >> > Ok, let me try to come up with some wording for this in the spec. I > >> > think we should go for this solution for now and see what the > >> > implementation feedback is. > >> > >> Actually, I had a somewhat different idea for how this should be > >> specified, which I think both would be easier to specify, as well as > >> implement. > >> > >> When iterating using a cursor, always remember the key-value of the > >> last returned entry. Whenever .continue() is called, it goes to find > >> the next entry with a key-value bigger than the one last returned. > >> This way the cursor isn't affected if the current entry is removed, > >> we'll still remember and use the key-value that that entry had. > >> > >> This is simple when iterating objectStores since keys are always > >> unique. However it's only marginally trickier when iterating indexes > >> which can contain duplicate key-values. As I suggested in bug 10058, I > >> think we should defined that duplicate index entries are ordered by > >> their key-value in the objectStore. So all a index-cursors needs to do > >> is to remember both the index key-value and the objectStore key-value > >> of the last returned entry. When .continue() is called it returns the > >> next entry with the same index key-value and larger objectStore > >> key-value if one exists, or the next entry with a larger index > >> key-value otherwise. > >> > >> Simpler put: For objectStore cursors, the key remembered is the > >> objectStores key-value. For index cursors, the key remembered is the > >> <index-key-value, objectStore-key-value> tuple. > >> > >> Let me know what you think. > > > > This sounds good. The only problem is how to specify a sort order for > > values. Sorting on the value itself means that structured clones > themselves > > need a sort order. This would include things like files, regexs, images, > > etc. So maybe we want to make the sort order be based on insertion > order? > > Does anyone have any better ideas? > > Note that I'm always talking about sorting on key-values. I.e. the > value of the keys. So we'll only ever need to define sorting of the > types that we accept as keys. So currently none of the types you are > mentioning above. > Good point. Got mixed up. That definitely makes things easier! J
Received on Wednesday, 14 July 2010 16:27:59 UTC