Re: [IndexedDB] Cursors and modifications from Jeremy Orlow on 2010-07-14 (public-webapps@w3.org from July to September 2010)

From: Jeremy Orlow <jorlow@chromium.org>
Date: Wed, 14 Jul 2010 17:27:07 +0100
To: Jonas Sicking <jonas@sicking.cc>
Cc: Andrei Popescu <andreip@google.com>, Pablo Castro <Pablo.Castro@microsoft.com>, Webapps WG <public-webapps@w3.org>
Message-ID: <AANLkTinCdxW3ACuG7oGMFK6rdEMc2_3K2Rn7PgRzfFMu@mail.gmail.com>
On Wed, Jul 14, 2010 at 5:17 PM, Jonas Sicking <jonas@sicking.cc> wrote:

> On Wed, Jul 14, 2010 at 5:12 AM, Jeremy Orlow <jorlow@chromium.org> wrote:
> > On Thu, Jul 8, 2010 at 8:42 PM, Jonas Sicking <jonas@sicking.cc> wrote:
> >>
> >> On Mon, Jul 5, 2010 at 9:45 AM, Andrei Popescu <andreip@google.com>
> wrote:
> >> > On Sat, Jul 3, 2010 at 2:09 AM, Jonas Sicking <jonas@sicking.cc>
> wrote:
> >> >> On Fri, Jul 2, 2010 at 5:44 PM, Andrei Popescu <andreip@google.com>
> >> >> wrote:
> >> >>> On Sat, Jul 3, 2010 at 1:14 AM, Jonas Sicking <jonas@sicking.cc>
> >> >>> wrote:
> >> >>>> On Fri, Jul 2, 2010 at 4:40 PM, Pablo Castro
> >> >>>> <Pablo.Castro@microsoft.com> wrote:
> >> >>>>>
> >> >>>>> From: public-webapps-request@w3.org
> >> >>>>> [mailto:public-webapps-request@w3.org] On Behalf Of Jonas Sicking
> >> >>>>> Sent: Friday, July 02, 2010 4:00 PM
> >> >>>>>
> >> >>>>>>> We ran into an complicated issue while implementing IndexedDB.
> In
> >> >>>>>>> short, what should happen if an object store is modified while a
> cursor is
> >> >>>>>>> iterating it? >> Note that the modification can be done within
> the same
> >> >>>>>>> transaction, so the read/write locks preventing several
> transactions from
> >> >>>>>>> accessing the same table isn't helping here.
> >> >>>>>>>
> >> >>>>>>> Detailed problem description (this assumes the API proposed by
> >> >>>>>>> mozilla):
> >> >>>>>>>
> >> >>>>>>> Consider a objectStore "words" containing the following objects:
> >> >>>>>>> { name: "alpha" }
> >> >>>>>>> { name: "bravo" }
> >> >>>>>>> { name: "charlie" }
> >> >>>>>>> { name: "delta" }
> >> >>>>>>>
> >> >>>>>>> and the following program (db is a previously opened
> IDBDatabase):
> >> >>>>>>>
> >> >>>>>>> var trans = db.transaction(["words"], READ_WRITE); var cursor;
> var
> >> >>>>>>> result = []; trans.objectStore("words").openCursor().onsuccess =
> function(e)
> >> >>>>>>> {
> >> >>>>>>>   cursor = e.result;
> >> >>>>>>>   result.push(cursor.value);
> >> >>>>>>>   cursor.continue();
> >> >>>>>>> }
> >> >>>>>>> trans.objectStore("words").get("delta").onsuccess = function(e)
> {
> >> >>>>>>>   trans.objectStore("words").put({ name: "delta",
> myModifiedValue:
> >> >>>>>>> 17 }); }
> >> >>>>>>>
> >> >>>>>>> When the cursor reads the "delta" entry, will it see the
> >> >>>>>>> 'myModifiedValue' property? Since we so far has defined that the
> callback
> >> >>>>>>> order is defined to be >> the request order, that means that put
> request
> >> >>>>>>> will be finished before the "delta" entry is iterated by the
> cursor.
> >> >>>>>>>
> >> >>>>>>> The problem is even more serious with cursors that iterate
> >> >>>>>>> indexes.
> >> >>>>>>> Here a modification can even affect the position of the
> currently
> >> >>>>>>> iterated object in the index, and the modification can (if i'm
> reading the
> >> >>>>>>> spec correctly) >> come from the cursor itself.
> >> >>>>>>>
> >> >>>>>>> Consider the following objectStore "people" with keyPath "name"
> >> >>>>>>> containing the following objects:
> >> >>>>>>>
> >> >>>>>>> { name: "Adam", count: 30 }
> >> >>>>>>> { name: "Bertil", count: 31 }
> >> >>>>>>> { name: "Cesar", count: 32 }
> >> >>>>>>> { name: "David", count: 33 }
> >> >>>>>>> { name: "Erik", count: 35 }
> >> >>>>>>>
> >> >>>>>>> and an index "countIndex" with keyPath "count". What would the
> >> >>>>>>> following code do?
> >> >>>>>>>
> >> >>>>>>> results = [];
> >> >>>>>>> db.objectStore("people",
> >> >>>>>>> READ_WRITE).index("countIndex").openObjectCursor().onsuccess =
> >> >>>>>>> function (e) {
> >> >>>>>>>   cursor = e.result;
> >> >>>>>>>   if (!cursor) {
> >> >>>>>>>     alert(results);
> >> >>>>>>>     return;
> >> >>>>>>>   }
> >> >>>>>>>   if (cursor.value.name == "Bertil") {
> >> >>>>>>>     cursor.update({name: "Bertil", count: 34 });
> >> >>>>>>>   }
> >> >>>>>>>   results.push(cursor.value.name);
> >> >>>>>>>   cursor.continue();
> >> >>>>>>> };
> >> >>>>>>>
> >> >>>>>>> What does this alert? Would it alert "Adam,Bertil,Erik" as the
> >> >>>>>>> cursor would stay on the "Bertil" object as it is moved in the
> index? Or
> >> >>>>>>> would it alert "Adam,Bertil,Cesar,David,Bertil,Erik" as we would
> iterate
> >> >>>>>>> "Bertil" again at its new position in the index?
> >> >>>>>
> >> >>>>> My first reaction is that both from the expected behavior of
> >> >>>>> perspective (transaction is the scope of isolation) and from the
> >> >>>>> implementation perspective it would be better to see live changes
> if they
> >> >>>>> happened in the same transaction as the cursor (over a store or
> index). So
> >> >>>>> in your example you would iterate one of the rows twice.
> Maintaining order
> >> >>>>> and membership stable would mean creating another scope of
> isolation within
> >> >>>>> the transaction, which to me would be unusual and it would be
> probably quite
> >> >>>>> painful to implement without spilling a copy of the records to
> disk (at
> >> >>>>> least a copy of the keys/order if you don't care about protecting
> from
> >> >>>>> changes that don't affect membership/order; some databases call
> these keyset
> >> >>>>> cursors).
> >> >>>>>
> >> >>>>>>>
> >> >>>>>>> We could say that cursors always iterate snapshots, however this
> >> >>>>>>> introduces MVCC. Though it seems to me that SNAPSHOT_READ
> already does that.
> >> >>>>>
> >> >>>>> Actually, even with MVCC you'd see your own changes, because they
> >> >>>>> happen in the same transaction so the buffer pool will use the
> same version
> >> >>>>> of the page. While it may be possible to reuse the MVCC
> infrastructure, it
> >> >>>>> would still require the introduction of a second scope for
> stability.
> >> >>>>
> >> >>>> It's quite implementable using append-only b-trees. Though it might
> >> >>>> be
> >> >>>> much to ask that implementations are forced to use that.
> >> >>>>
> >> >>>> An alternative to what I suggested earlier is that all read
> >> >>>> operations
> >> >>>> use "read committed". I.e. they always see the data as it looked at
> >> >>>> the beginning of the transaction. Would this be more compatible
> with
> >> >>>> existing MVCC implementations?
> >> >>>>
> >> >>>
> >> >>> Hmm, so if you modified the object store and then, later in the same
> >> >>> transaction, used a cursor to iterate the object store, the cursor
> >> >>> would not see the earlier modifications? That's not very intiutive
> to
> >> >>> me...or did I misunderstand?
> >> >>
> >> >> If we go with "read committed" then yes, your understanding is
> correct.
> >> >>
> >> >> Out of curiosity, how are you feeling about the "cursors iterate data
> >> >> as it looked when cursor was opened" solution?
> >> >>
> >> >
> >> > I feel that that's the easiest solution to specify although it may
> >> > also be unintuitive if one calls 'put / update' and then expects to
> >> > see the updated value once the cursor gets to the relevant object. My
> >> > other concern was the one brought by Pablo, i.e. is it even more
> >> > complex to implement another scope of isolation for the duration of
> >> > the cursor?
> >> >
> >> >>>> I'd imagine this should be as easy to implement as SNAPSHOT_READ.
> >> >>>>
> >> >>>>>>> We could also say that cursors iterate live data though that can
> >> >>>>>>> be pretty confusing and forces the implementation to deal with
> entries being
> >> >>>>>>> added and >> removed during iteration, and it'd be tricky to
> define all edge
> >> >>>>>>> cases.
> >> >>>>>
> >> >>>>> Would this be any different from the implementation perspective
> than
> >> >>>>> dealing with changes that happen through other transactions once
> they are
> >> >>>>> committed? Typically at least in non-MVCC systems committed
> changes that are
> >> >>>>> "further ahead" in a cursor scan end up showing up even when the
> cursor was
> >> >>>>> opened before the other transaction committed.
> >> >>>>
> >> >>>> IndexedDB doesn't allow write transactions to a given store to
> start
> >> >>>> while there are read transactions using that store, so that doesn't
> >> >>>> seem to be a problem. Unless I'm misunderstanding something?
> >> >>>>
> >> >>>
> >> >>> That was my understanding too: while a cursor was open, you can't
> >> >>> allow a write transaction to start.
> >> >>
> >> >> Exactly. Though "while a read transaction was open, you can't allow a
> >> >> write transaction to start" is even more correct.
> >> >>
> >> >>>>>>> It's certainly debatable how much of a problem any of these
> >> >>>>>>> edgecases are for users. Note that all of this is only an issue
> if you
> >> >>>>>>> modify and read from the >> same records *in the same
> transaction*. I can't
> >> >>>>>>> think of a case where it isn't trivial to avoid these problems
> by separating
> >> >>>>>>> things into separate transactions.
> >> >>>>>>> However it'd be nice to avoid creating foot-guns for people to
> >> >>>>>>> play with (think of the children!).
> >> >>>>>>>
> >> >>>>>>> However we still need to define *something*. I would suggest
> that
> >> >>>>>>> we define that cursors iterate snapshots. It seems the cleanest
> for users
> >> >>>>>>> and easiest >> to define. And once implementations add MVCC
> support it
> >> >>>>>>> should be easy to implement. I think we've come up with a decent
> plan for
> >> >>>>>>> how to do implement it in sqlite even without proper MVCC, so it
> should be
> >> >>>>>>> doable even then.
> >> >>>>>
> >> >>>>> Besides the expectations aspects, I worry that doing this means
> that
> >> >>>>> opening a cursor means incurring in substantial cost for all cases
> (e.g.
> >> >>>>> creating a keyset or something like that).
> >> >>>>
> >> >>>> I agree we definitely don't want that. We are working on an
> >> >>>> implementation which is backed by a SQL database and completely
> >> >>>> incapable of MVCC, so it seems doable. However I don't yet know how
> >> >>>> much of a complexity and performance penalty that carries.
> >> >>>>
> >> >>>
> >> >>> On the other hand, as you said earlier, maybe allowing the live
> >> >>> changes to be visible in the cursor is not such a big problem as
> apps
> >> >>> can work around these edge cases?
> >> >>
> >> >> I suspect so. Though given that behavior can be very unintuitive, we
> >> >> would have to define things to a very high level of detail to ensure
> >> >> that the same unintuitive behavior is happening in all
> >> >> implementations. Additionally, I suspect implementation can get quite
> >> >> complex in order to deal with all edge cases, for example dealing
> with
> >> >> data that was read ahead optimistically.
> >> >>
> >> >> We strategy for defining this is to define all mutating operations in
> >> >> terms of which ones are insertions vs. updates vs. removals. Both in
> >> >> the objectStore and in indexes. For example, does a
> IDBObjectStore.put
> >> >> call that modifies an existing entry cause corresponding entries in
> >> >> indexes to be updated or removed and reinserted? Both in the
> situation
> >> >> when the put call causes index values to be changed and not changed.
> >> >>
> >> >> Then define how active cursors move around when an entry that they
> are
> >> >> currently on is removed, or when entries before and after the current
> >> >> entry are inserted. For all types of cursors; forwards, backwards and
> >> >> no_duplicate. Also need to define this in situations when the cursor
> >> >> is currently firmly in one position, or when a .continue() call has
> >> >> been made, but the callback not yet has fired. Likewise when the
> >> >> cursor has not yet fired its first callback and entries are inserted
> >> >> in the beginning, and when a cursor has notified about the last
> entry,
> >> >> and entries are added after the end.
> >> >>
> >> >> It's certainly possible, but no small feat.
> >> >>
> >> >
> >> > Ok, let me try to come up with some wording for this in the spec. I
> >> > think we should go for this solution for now and see what the
> >> > implementation feedback is.
> >>
> >> Actually, I had a somewhat different idea for how this should be
> >> specified, which I think both would be easier to specify, as well as
> >> implement.
> >>
> >> When iterating using a cursor, always remember the key-value of the
> >> last returned entry. Whenever .continue() is called, it goes to find
> >> the next entry with a key-value bigger than the one last returned.
> >> This way the cursor isn't affected if the current entry is removed,
> >> we'll still remember and use the key-value that that entry had.
> >>
> >> This is simple when iterating objectStores since keys are always
> >> unique. However it's only marginally trickier when iterating indexes
> >> which can contain duplicate key-values. As I suggested in bug 10058, I
> >> think we should defined that duplicate index entries are ordered by
> >> their key-value in the objectStore. So all a index-cursors needs to do
> >> is to remember both the index key-value and the objectStore key-value
> >> of the last returned entry. When .continue() is called it returns the
> >> next entry with the same index key-value and larger objectStore
> >> key-value if one exists, or the next entry with a larger index
> >> key-value otherwise.
> >>
> >> Simpler put: For objectStore cursors, the key remembered is the
> >> objectStores key-value. For index cursors, the key remembered is the
> >> <index-key-value, objectStore-key-value> tuple.
> >>
> >> Let me know what you think.
> >
> > This sounds good.  The only problem is how to specify a sort order for
> > values.  Sorting on the value itself means that structured clones
> themselves
> > need a sort order.  This would include things like files, regexs, images,
> > etc.  So maybe we want to make the sort order be based on insertion
> order?
> >  Does anyone have any better ideas?
>
> Note that I'm always talking about sorting on key-values. I.e. the
> value of the keys. So we'll only ever need to define sorting of the
> types that we accept as keys. So currently none of the types you are
> mentioning above.
>

Good point.  Got mixed up.  That definitely makes things easier!

J
Received on Wednesday, 14 July 2010 16:27:59 UTC