Re: [IndexedDB] Cursors and modifications from Jonas Sicking on 2010-07-14 (public-webapps@w3.org from July to September 2010)

From: Jonas Sicking <jonas@sicking.cc>
Date: Wed, 14 Jul 2010 09:17:49 -0700
To: Jeremy Orlow <jorlow@chromium.org>
Cc: Andrei Popescu <andreip@google.com>, Pablo Castro <Pablo.Castro@microsoft.com>, Webapps WG <public-webapps@w3.org>
Message-ID: <AANLkTimKZokur2eRgBOVyI0Jc0DSNnq7xMxOAhl_zKCM@mail.gmail.com>
On Wed, Jul 14, 2010 at 5:12 AM, Jeremy Orlow <jorlow@chromium.org> wrote:
> On Thu, Jul 8, 2010 at 8:42 PM, Jonas Sicking <jonas@sicking.cc> wrote:
>>
>> On Mon, Jul 5, 2010 at 9:45 AM, Andrei Popescu <andreip@google.com> wrote:
>> > On Sat, Jul 3, 2010 at 2:09 AM, Jonas Sicking <jonas@sicking.cc> wrote:
>> >> On Fri, Jul 2, 2010 at 5:44 PM, Andrei Popescu <andreip@google.com>
>> >> wrote:
>> >>> On Sat, Jul 3, 2010 at 1:14 AM, Jonas Sicking <jonas@sicking.cc>
>> >>> wrote:
>> >>>> On Fri, Jul 2, 2010 at 4:40 PM, Pablo Castro
>> >>>> <Pablo.Castro@microsoft.com> wrote:
>> >>>>>
>> >>>>> From: public-webapps-request@w3.org
>> >>>>> [mailto:public-webapps-request@w3.org] On Behalf Of Jonas Sicking
>> >>>>> Sent: Friday, July 02, 2010 4:00 PM
>> >>>>>
>> >>>>>>> We ran into an complicated issue while implementing IndexedDB. In
>> >>>>>>> short, what should happen if an object store is modified while a cursor is
>> >>>>>>> iterating it? >> Note that the modification can be done within the same
>> >>>>>>> transaction, so the read/write locks preventing several transactions from
>> >>>>>>> accessing the same table isn't helping here.
>> >>>>>>>
>> >>>>>>> Detailed problem description (this assumes the API proposed by
>> >>>>>>> mozilla):
>> >>>>>>>
>> >>>>>>> Consider a objectStore "words" containing the following objects:
>> >>>>>>> { name: "alpha" }
>> >>>>>>> { name: "bravo" }
>> >>>>>>> { name: "charlie" }
>> >>>>>>> { name: "delta" }
>> >>>>>>>
>> >>>>>>> and the following program (db is a previously opened IDBDatabase):
>> >>>>>>>
>> >>>>>>> var trans = db.transaction(["words"], READ_WRITE); var cursor; var
>> >>>>>>> result = []; trans.objectStore("words").openCursor().onsuccess = function(e)
>> >>>>>>> {
>> >>>>>>>   cursor = e.result;
>> >>>>>>>   result.push(cursor.value);
>> >>>>>>>   cursor.continue();
>> >>>>>>> }
>> >>>>>>> trans.objectStore("words").get("delta").onsuccess = function(e) {
>> >>>>>>>   trans.objectStore("words").put({ name: "delta", myModifiedValue:
>> >>>>>>> 17 }); }
>> >>>>>>>
>> >>>>>>> When the cursor reads the "delta" entry, will it see the
>> >>>>>>> 'myModifiedValue' property? Since we so far has defined that the callback
>> >>>>>>> order is defined to be >> the request order, that means that put request
>> >>>>>>> will be finished before the "delta" entry is iterated by the cursor.
>> >>>>>>>
>> >>>>>>> The problem is even more serious with cursors that iterate
>> >>>>>>> indexes.
>> >>>>>>> Here a modification can even affect the position of the currently
>> >>>>>>> iterated object in the index, and the modification can (if i'm reading the
>> >>>>>>> spec correctly) >> come from the cursor itself.
>> >>>>>>>
>> >>>>>>> Consider the following objectStore "people" with keyPath "name"
>> >>>>>>> containing the following objects:
>> >>>>>>>
>> >>>>>>> { name: "Adam", count: 30 }
>> >>>>>>> { name: "Bertil", count: 31 }
>> >>>>>>> { name: "Cesar", count: 32 }
>> >>>>>>> { name: "David", count: 33 }
>> >>>>>>> { name: "Erik", count: 35 }
>> >>>>>>>
>> >>>>>>> and an index "countIndex" with keyPath "count". What would the
>> >>>>>>> following code do?
>> >>>>>>>
>> >>>>>>> results = [];
>> >>>>>>> db.objectStore("people",
>> >>>>>>> READ_WRITE).index("countIndex").openObjectCursor().onsuccess =
>> >>>>>>> function (e) {
>> >>>>>>>   cursor = e.result;
>> >>>>>>>   if (!cursor) {
>> >>>>>>>     alert(results);
>> >>>>>>>     return;
>> >>>>>>>   }
>> >>>>>>>   if (cursor.value.name == "Bertil") {
>> >>>>>>>     cursor.update({name: "Bertil", count: 34 });
>> >>>>>>>   }
>> >>>>>>>   results.push(cursor.value.name);
>> >>>>>>>   cursor.continue();
>> >>>>>>> };
>> >>>>>>>
>> >>>>>>> What does this alert? Would it alert "Adam,Bertil,Erik" as the
>> >>>>>>> cursor would stay on the "Bertil" object as it is moved in the index? Or
>> >>>>>>> would it alert "Adam,Bertil,Cesar,David,Bertil,Erik" as we would iterate
>> >>>>>>> "Bertil" again at its new position in the index?
>> >>>>>
>> >>>>> My first reaction is that both from the expected behavior of
>> >>>>> perspective (transaction is the scope of isolation) and from the
>> >>>>> implementation perspective it would be better to see live changes if they
>> >>>>> happened in the same transaction as the cursor (over a store or index). So
>> >>>>> in your example you would iterate one of the rows twice. Maintaining order
>> >>>>> and membership stable would mean creating another scope of isolation within
>> >>>>> the transaction, which to me would be unusual and it would be probably quite
>> >>>>> painful to implement without spilling a copy of the records to disk (at
>> >>>>> least a copy of the keys/order if you don't care about protecting from
>> >>>>> changes that don't affect membership/order; some databases call these keyset
>> >>>>> cursors).
>> >>>>>
>> >>>>>>>
>> >>>>>>> We could say that cursors always iterate snapshots, however this
>> >>>>>>> introduces MVCC. Though it seems to me that SNAPSHOT_READ already does that.
>> >>>>>
>> >>>>> Actually, even with MVCC you'd see your own changes, because they
>> >>>>> happen in the same transaction so the buffer pool will use the same version
>> >>>>> of the page. While it may be possible to reuse the MVCC infrastructure, it
>> >>>>> would still require the introduction of a second scope for stability.
>> >>>>
>> >>>> It's quite implementable using append-only b-trees. Though it might
>> >>>> be
>> >>>> much to ask that implementations are forced to use that.
>> >>>>
>> >>>> An alternative to what I suggested earlier is that all read
>> >>>> operations
>> >>>> use "read committed". I.e. they always see the data as it looked at
>> >>>> the beginning of the transaction. Would this be more compatible with
>> >>>> existing MVCC implementations?
>> >>>>
>> >>>
>> >>> Hmm, so if you modified the object store and then, later in the same
>> >>> transaction, used a cursor to iterate the object store, the cursor
>> >>> would not see the earlier modifications? That's not very intiutive to
>> >>> me...or did I misunderstand?
>> >>
>> >> If we go with "read committed" then yes, your understanding is correct.
>> >>
>> >> Out of curiosity, how are you feeling about the "cursors iterate data
>> >> as it looked when cursor was opened" solution?
>> >>
>> >
>> > I feel that that's the easiest solution to specify although it may
>> > also be unintuitive if one calls 'put / update' and then expects to
>> > see the updated value once the cursor gets to the relevant object. My
>> > other concern was the one brought by Pablo, i.e. is it even more
>> > complex to implement another scope of isolation for the duration of
>> > the cursor?
>> >
>> >>>> I'd imagine this should be as easy to implement as SNAPSHOT_READ.
>> >>>>
>> >>>>>>> We could also say that cursors iterate live data though that can
>> >>>>>>> be pretty confusing and forces the implementation to deal with entries being
>> >>>>>>> added and >> removed during iteration, and it'd be tricky to define all edge
>> >>>>>>> cases.
>> >>>>>
>> >>>>> Would this be any different from the implementation perspective than
>> >>>>> dealing with changes that happen through other transactions once they are
>> >>>>> committed? Typically at least in non-MVCC systems committed changes that are
>> >>>>> "further ahead" in a cursor scan end up showing up even when the cursor was
>> >>>>> opened before the other transaction committed.
>> >>>>
>> >>>> IndexedDB doesn't allow write transactions to a given store to start
>> >>>> while there are read transactions using that store, so that doesn't
>> >>>> seem to be a problem. Unless I'm misunderstanding something?
>> >>>>
>> >>>
>> >>> That was my understanding too: while a cursor was open, you can't
>> >>> allow a write transaction to start.
>> >>
>> >> Exactly. Though "while a read transaction was open, you can't allow a
>> >> write transaction to start" is even more correct.
>> >>
>> >>>>>>> It's certainly debatable how much of a problem any of these
>> >>>>>>> edgecases are for users. Note that all of this is only an issue if you
>> >>>>>>> modify and read from the >> same records *in the same transaction*. I can't
>> >>>>>>> think of a case where it isn't trivial to avoid these problems by separating
>> >>>>>>> things into separate transactions.
>> >>>>>>> However it'd be nice to avoid creating foot-guns for people to
>> >>>>>>> play with (think of the children!).
>> >>>>>>>
>> >>>>>>> However we still need to define *something*. I would suggest that
>> >>>>>>> we define that cursors iterate snapshots. It seems the cleanest for users
>> >>>>>>> and easiest >> to define. And once implementations add MVCC support it
>> >>>>>>> should be easy to implement. I think we've come up with a decent plan for
>> >>>>>>> how to do implement it in sqlite even without proper MVCC, so it should be
>> >>>>>>> doable even then.
>> >>>>>
>> >>>>> Besides the expectations aspects, I worry that doing this means that
>> >>>>> opening a cursor means incurring in substantial cost for all cases (e.g.
>> >>>>> creating a keyset or something like that).
>> >>>>
>> >>>> I agree we definitely don't want that. We are working on an
>> >>>> implementation which is backed by a SQL database and completely
>> >>>> incapable of MVCC, so it seems doable. However I don't yet know how
>> >>>> much of a complexity and performance penalty that carries.
>> >>>>
>> >>>
>> >>> On the other hand, as you said earlier, maybe allowing the live
>> >>> changes to be visible in the cursor is not such a big problem as apps
>> >>> can work around these edge cases?
>> >>
>> >> I suspect so. Though given that behavior can be very unintuitive, we
>> >> would have to define things to a very high level of detail to ensure
>> >> that the same unintuitive behavior is happening in all
>> >> implementations. Additionally, I suspect implementation can get quite
>> >> complex in order to deal with all edge cases, for example dealing with
>> >> data that was read ahead optimistically.
>> >>
>> >> We strategy for defining this is to define all mutating operations in
>> >> terms of which ones are insertions vs. updates vs. removals. Both in
>> >> the objectStore and in indexes. For example, does a IDBObjectStore.put
>> >> call that modifies an existing entry cause corresponding entries in
>> >> indexes to be updated or removed and reinserted? Both in the situation
>> >> when the put call causes index values to be changed and not changed.
>> >>
>> >> Then define how active cursors move around when an entry that they are
>> >> currently on is removed, or when entries before and after the current
>> >> entry are inserted. For all types of cursors; forwards, backwards and
>> >> no_duplicate. Also need to define this in situations when the cursor
>> >> is currently firmly in one position, or when a .continue() call has
>> >> been made, but the callback not yet has fired. Likewise when the
>> >> cursor has not yet fired its first callback and entries are inserted
>> >> in the beginning, and when a cursor has notified about the last entry,
>> >> and entries are added after the end.
>> >>
>> >> It's certainly possible, but no small feat.
>> >>
>> >
>> > Ok, let me try to come up with some wording for this in the spec. I
>> > think we should go for this solution for now and see what the
>> > implementation feedback is.
>>
>> Actually, I had a somewhat different idea for how this should be
>> specified, which I think both would be easier to specify, as well as
>> implement.
>>
>> When iterating using a cursor, always remember the key-value of the
>> last returned entry. Whenever .continue() is called, it goes to find
>> the next entry with a key-value bigger than the one last returned.
>> This way the cursor isn't affected if the current entry is removed,
>> we'll still remember and use the key-value that that entry had.
>>
>> This is simple when iterating objectStores since keys are always
>> unique. However it's only marginally trickier when iterating indexes
>> which can contain duplicate key-values. As I suggested in bug 10058, I
>> think we should defined that duplicate index entries are ordered by
>> their key-value in the objectStore. So all a index-cursors needs to do
>> is to remember both the index key-value and the objectStore key-value
>> of the last returned entry. When .continue() is called it returns the
>> next entry with the same index key-value and larger objectStore
>> key-value if one exists, or the next entry with a larger index
>> key-value otherwise.
>>
>> Simpler put: For objectStore cursors, the key remembered is the
>> objectStores key-value. For index cursors, the key remembered is the
>> <index-key-value, objectStore-key-value> tuple.
>>
>> Let me know what you think.
>
> This sounds good.  The only problem is how to specify a sort order for
> values.  Sorting on the value itself means that structured clones themselves
> need a sort order.  This would include things like files, regexs, images,
> etc.  So maybe we want to make the sort order be based on insertion order?
>  Does anyone have any better ideas?

Note that I'm always talking about sorting on key-values. I.e. the
value of the keys. So we'll only ever need to define sorting of the
types that we accept as keys. So currently none of the types you are
mentioning above.

/ Jonas
Received on Wednesday, 14 July 2010 16:18:51 UTC