Re: [IndexedDB] Cursors and modifications from Jeremy Orlow on 2010-07-15 (public-webapps@w3.org from July to September 2010)

From: Jeremy Orlow <jorlow@chromium.org>
Date: Thu, 15 Jul 2010 10:03:45 +0100
To: Jonas Sicking <jonas@sicking.cc>
Cc: Pablo Castro <Pablo.Castro@microsoft.com>, Andrei Popescu <andreip@google.com>, Webapps WG <public-webapps@w3.org>
Message-ID: <AANLkTinpLHWeQfxtyUewyYs8lM4E9G9UbBbveJngvkXz@mail.gmail.com>
On Thu, Jul 15, 2010 at 2:44 AM, Jonas Sicking <jonas@sicking.cc> wrote:

> On Wed, Jul 14, 2010 at 6:20 PM, Pablo Castro
> <Pablo.Castro@microsoft.com> wrote:
> > Making sure I get the essence of this thread: we're saying that cursors
> see live changes as they happen on objects that are "after" the object
> you're currently standing on;
>
> Yes.
>
> > and of course, any other activity within a transaction sees all the
> changes that happened before that activity took place. Is that accurate?
>
> Yes. All other activity sees all changes as soon as they have happened.
>
> > If it's accurate, as a side note, for the async API it seems that this
> makes it more interesting to enforce callback order, so we can more easily
> explain what we mean by "before".
>
> Indeed.
>

What do you mean by enforce callback order?  Are you saying that callbacks
should be done in the order the requests are made (rather than prioritizing
cursor callbacks)?  (That's how I read it, but Jonas' "Indeed" makes me
suspect I missed something. :-)

J


> / Jonas
>
> > From: jorlow@google.com [mailto:jorlow@google.com] On Behalf Of Jeremy
> Orlow
> > Sent: Wednesday, July 14, 2010 9:27 AM
> >
> > On Wed, Jul 14, 2010 at 5:17 PM, Jonas Sicking <jonas@sicking.cc> wrote:
> > On Wed, Jul 14, 2010 at 5:12 AM, Jeremy Orlow <jorlow@chromium.org>
> wrote:
> >> On Thu, Jul 8, 2010 at 8:42 PM, Jonas Sicking <jonas@sicking.cc> wrote:
> >>>
> >>> On Mon, Jul 5, 2010 at 9:45 AM, Andrei Popescu <andreip@google.com>
> wrote:
> >>> > On Sat, Jul 3, 2010 at 2:09 AM, Jonas Sicking <jonas@sicking.cc>
> wrote:
> >>> >> On Fri, Jul 2, 2010 at 5:44 PM, Andrei Popescu <andreip@google.com>
> >>> >> wrote:
> >>> >>> On Sat, Jul 3, 2010 at 1:14 AM, Jonas Sicking <jonas@sicking.cc>
> >>> >>> wrote:
> >>> >>>> On Fri, Jul 2, 2010 at 4:40 PM, Pablo Castro
> >>> >>>> <Pablo.Castro@microsoft.com> wrote:
> >>> >>>>>
> >>> >>>>> From: public-webapps-request@w3.org
> >>> >>>>> [mailto:public-webapps-request@w3.org] On Behalf Of Jonas
> Sicking
> >>> >>>>> Sent: Friday, July 02, 2010 4:00 PM
> >>> >>>>>
> >>> >>>>>>> We ran into an complicated issue while implementing IndexedDB.
> In
> >>> >>>>>>> short, what should happen if an object store is modified while
> a cursor is
> >>> >>>>>>> iterating it? >> Note that the modification can be done within
> the same
> >>> >>>>>>> transaction, so the read/write locks preventing several
> transactions from
> >>> >>>>>>> accessing the same table isn't helping here.
> >>> >>>>>>>
> >>> >>>>>>> Detailed problem description (this assumes the API proposed by
> >>> >>>>>>> mozilla):
> >>> >>>>>>>
> >>> >>>>>>> Consider a objectStore "words" containing the following
> objects:
> >>> >>>>>>> { name: "alpha" }
> >>> >>>>>>> { name: "bravo" }
> >>> >>>>>>> { name: "charlie" }
> >>> >>>>>>> { name: "delta" }
> >>> >>>>>>>
> >>> >>>>>>> and the following program (db is a previously opened
> IDBDatabase):
> >>> >>>>>>>
> >>> >>>>>>> var trans = db.transaction(["words"], READ_WRITE); var cursor;
> var
> >>> >>>>>>> result = []; trans.objectStore("words").openCursor().onsuccess
> = function(e)
> >>> >>>>>>> {
> >>> >>>>>>>   cursor = e.result;
> >>> >>>>>>>   result.push(cursor.value);
> >>> >>>>>>>   cursor.continue();
> >>> >>>>>>> }
> >>> >>>>>>> trans.objectStore("words").get("delta").onsuccess = function(e)
> {
> >>> >>>>>>>   trans.objectStore("words").put({ name: "delta",
> myModifiedValue:
> >>> >>>>>>> 17 }); }
> >>> >>>>>>>
> >>> >>>>>>> When the cursor reads the "delta" entry, will it see the
> >>> >>>>>>> 'myModifiedValue' property? Since we so far has defined that
> the callback
> >>> >>>>>>> order is defined to be >> the request order, that means that
> put request
> >>> >>>>>>> will be finished before the "delta" entry is iterated by the
> cursor.
> >>> >>>>>>>
> >>> >>>>>>> The problem is even more serious with cursors that iterate
> >>> >>>>>>> indexes.
> >>> >>>>>>> Here a modification can even affect the position of the
> currently
> >>> >>>>>>> iterated object in the index, and the modification can (if i'm
> reading the
> >>> >>>>>>> spec correctly) >> come from the cursor itself.
> >>> >>>>>>>
> >>> >>>>>>> Consider the following objectStore "people" with keyPath "name"
> >>> >>>>>>> containing the following objects:
> >>> >>>>>>>
> >>> >>>>>>> { name: "Adam", count: 30 }
> >>> >>>>>>> { name: "Bertil", count: 31 }
> >>> >>>>>>> { name: "Cesar", count: 32 }
> >>> >>>>>>> { name: "David", count: 33 }
> >>> >>>>>>> { name: "Erik", count: 35 }
> >>> >>>>>>>
> >>> >>>>>>> and an index "countIndex" with keyPath "count". What would the
> >>> >>>>>>> following code do?
> >>> >>>>>>>
> >>> >>>>>>> results = [];
> >>> >>>>>>> db.objectStore("people",
> >>> >>>>>>> READ_WRITE).index("countIndex").openObjectCursor().onsuccess =
> >>> >>>>>>> function (e) {
> >>> >>>>>>>   cursor = e.result;
> >>> >>>>>>>   if (!cursor) {
> >>> >>>>>>>     alert(results);
> >>> >>>>>>>     return;
> >>> >>>>>>>   }
> >>> >>>>>>>   if (cursor.value.name == "Bertil") {
> >>> >>>>>>>     cursor.update({name: "Bertil", count: 34 });
> >>> >>>>>>>   }
> >>> >>>>>>>   results.push(cursor.value.name);
> >>> >>>>>>>   cursor.continue();
> >>> >>>>>>> };
> >>> >>>>>>>
> >>> >>>>>>> What does this alert? Would it alert "Adam,Bertil,Erik" as the
> >>> >>>>>>> cursor would stay on the "Bertil" object as it is moved in the
> index? Or
> >>> >>>>>>> would it alert "Adam,Bertil,Cesar,David,Bertil,Erik" as we
> would iterate
> >>> >>>>>>> "Bertil" again at its new position in the index?
> >>> >>>>>
> >>> >>>>> My first reaction is that both from the expected behavior of
> >>> >>>>> perspective (transaction is the scope of isolation) and from the
> >>> >>>>> implementation perspective it would be better to see live changes
> if they
> >>> >>>>> happened in the same transaction as the cursor (over a store or
> index). So
> >>> >>>>> in your example you would iterate one of the rows twice.
> Maintaining order
> >>> >>>>> and membership stable would mean creating another scope of
> isolation within
> >>> >>>>> the transaction, which to me would be unusual and it would be
> probably quite
> >>> >>>>> painful to implement without spilling a copy of the records to
> disk (at
> >>> >>>>> least a copy of the keys/order if you don't care about protecting
> from
> >>> >>>>> changes that don't affect membership/order; some databases call
> these keyset
> >>> >>>>> cursors).
> >>> >>>>>
> >>> >>>>>>>
> >>> >>>>>>> We could say that cursors always iterate snapshots, however
> this
> >>> >>>>>>> introduces MVCC. Though it seems to me that SNAPSHOT_READ
> already does that.
> >>> >>>>>
> >>> >>>>> Actually, even with MVCC you'd see your own changes, because they
> >>> >>>>> happen in the same transaction so the buffer pool will use the
> same version
> >>> >>>>> of the page. While it may be possible to reuse the MVCC
> infrastructure, it
> >>> >>>>> would still require the introduction of a second scope for
> stability.
> >>> >>>>
> >>> >>>> It's quite implementable using append-only b-trees. Though it
> might
> >>> >>>> be
> >>> >>>> much to ask that implementations are forced to use that.
> >>> >>>>
> >>> >>>> An alternative to what I suggested earlier is that all read
> >>> >>>> operations
> >>> >>>> use "read committed". I.e. they always see the data as it looked
> at
> >>> >>>> the beginning of the transaction. Would this be more compatible
> with
> >>> >>>> existing MVCC implementations?
> >>> >>>>
> >>> >>>
> >>> >>> Hmm, so if you modified the object store and then, later in the
> same
> >>> >>> transaction, used a cursor to iterate the object store, the cursor
> >>> >>> would not see the earlier modifications? That's not very intiutive
> to
> >>> >>> me...or did I misunderstand?
> >>> >>
> >>> >> If we go with "read committed" then yes, your understanding is
> correct.
> >>> >>
> >>> >> Out of curiosity, how are you feeling about the "cursors iterate
> data
> >>> >> as it looked when cursor was opened" solution?
> >>> >>
> >>> >
> >>> > I feel that that's the easiest solution to specify although it may
> >>> > also be unintuitive if one calls 'put / update' and then expects to
> >>> > see the updated value once the cursor gets to the relevant object. My
> >>> > other concern was the one brought by Pablo, i.e. is it even more
> >>> > complex to implement another scope of isolation for the duration of
> >>> > the cursor?
> >>> >
> >>> >>>> I'd imagine this should be as easy to implement as SNAPSHOT_READ.
> >>> >>>>
> >>> >>>>>>> We could also say that cursors iterate live data though that
> can
> >>> >>>>>>> be pretty confusing and forces the implementation to deal with
> entries being
> >>> >>>>>>> added and >> removed during iteration, and it'd be tricky to
> define all edge
> >>> >>>>>>> cases.
> >>> >>>>>
> >>> >>>>> Would this be any different from the implementation perspective
> than
> >>> >>>>> dealing with changes that happen through other transactions once
> they are
> >>> >>>>> committed? Typically at least in non-MVCC systems committed
> changes that are
> >>> >>>>> "further ahead" in a cursor scan end up showing up even when the
> cursor was
> >>> >>>>> opened before the other transaction committed.
> >>> >>>>
> >>> >>>> IndexedDB doesn't allow write transactions to a given store to
> start
> >>> >>>> while there are read transactions using that store, so that
> doesn't
> >>> >>>> seem to be a problem. Unless I'm misunderstanding something?
> >>> >>>>
> >>> >>>
> >>> >>> That was my understanding too: while a cursor was open, you can't
> >>> >>> allow a write transaction to start.
> >>> >>
> >>> >> Exactly. Though "while a read transaction was open, you can't allow
> a
> >>> >> write transaction to start" is even more correct.
> >>> >>
> >>> >>>>>>> It's certainly debatable how much of a problem any of these
> >>> >>>>>>> edgecases are for users. Note that all of this is only an issue
> if you
> >>> >>>>>>> modify and read from the >> same records *in the same
> transaction*. I can't
> >>> >>>>>>> think of a case where it isn't trivial to avoid these problems
> by separating
> >>> >>>>>>> things into separate transactions.
> >>> >>>>>>> However it'd be nice to avoid creating foot-guns for people to
> >>> >>>>>>> play with (think of the children!).
> >>> >>>>>>>
> >>> >>>>>>> However we still need to define *something*. I would suggest
> that
> >>> >>>>>>> we define that cursors iterate snapshots. It seems the cleanest
> for users
> >>> >>>>>>> and easiest >> to define. And once implementations add MVCC
> support it
> >>> >>>>>>> should be easy to implement. I think we've come up with a
> decent plan for
> >>> >>>>>>> how to do implement it in sqlite even without proper MVCC, so
> it should be
> >>> >>>>>>> doable even then.
> >>> >>>>>
> >>> >>>>> Besides the expectations aspects, I worry that doing this means
> that
> >>> >>>>> opening a cursor means incurring in substantial cost for all
> cases (e.g.
> >>> >>>>> creating a keyset or something like that).
> >>> >>>>
> >>> >>>> I agree we definitely don't want that. We are working on an
> >>> >>>> implementation which is backed by a SQL database and completely
> >>> >>>> incapable of MVCC, so it seems doable. However I don't yet know
> how
> >>> >>>> much of a complexity and performance penalty that carries.
> >>> >>>>
> >>> >>>
> >>> >>> On the other hand, as you said earlier, maybe allowing the live
> >>> >>> changes to be visible in the cursor is not such a big problem as
> apps
> >>> >>> can work around these edge cases?
> >>> >>
> >>> >> I suspect so. Though given that behavior can be very unintuitive, we
> >>> >> would have to define things to a very high level of detail to ensure
> >>> >> that the same unintuitive behavior is happening in all
> >>> >> implementations. Additionally, I suspect implementation can get
> quite
> >>> >> complex in order to deal with all edge cases, for example dealing
> with
> >>> >> data that was read ahead optimistically.
> >>> >>
> >>> >> We strategy for defining this is to define all mutating operations
> in
> >>> >> terms of which ones are insertions vs. updates vs. removals. Both in
> >>> >> the objectStore and in indexes. For example, does a
> IDBObjectStore.put
> >>> >> call that modifies an existing entry cause corresponding entries in
> >>> >> indexes to be updated or removed and reinserted? Both in the
> situation
> >>> >> when the put call causes index values to be changed and not changed.
> >>> >>
> >>> >> Then define how active cursors move around when an entry that they
> are
> >>> >> currently on is removed, or when entries before and after the
> current
> >>> >> entry are inserted. For all types of cursors; forwards, backwards
> and
> >>> >> no_duplicate. Also need to define this in situations when the cursor
> >>> >> is currently firmly in one position, or when a .continue() call has
> >>> >> been made, but the callback not yet has fired. Likewise when the
> >>> >> cursor has not yet fired its first callback and entries are inserted
> >>> >> in the beginning, and when a cursor has notified about the last
> entry,
> >>> >> and entries are added after the end.
> >>> >>
> >>> >> It's certainly possible, but no small feat.
> >>> >>
> >>> >
> >>> > Ok, let me try to come up with some wording for this in the spec. I
> >>> > think we should go for this solution for now and see what the
> >>> > implementation feedback is.
> >>>
> >>> Actually, I had a somewhat different idea for how this should be
> >>> specified, which I think both would be easier to specify, as well as
> >>> implement.
> >>>
> >>> When iterating using a cursor, always remember the key-value of the
> >>> last returned entry. Whenever .continue() is called, it goes to find
> >>> the next entry with a key-value bigger than the one last returned.
> >>> This way the cursor isn't affected if the current entry is removed,
> >>> we'll still remember and use the key-value that that entry had.
> >>>
> >>> This is simple when iterating objectStores since keys are always
> >>> unique. However it's only marginally trickier when iterating indexes
> >>> which can contain duplicate key-values. As I suggested in bug 10058, I
> >>> think we should defined that duplicate index entries are ordered by
> >>> their key-value in the objectStore. So all a index-cursors needs to do
> >>> is to remember both the index key-value and the objectStore key-value
> >>> of the last returned entry. When .continue() is called it returns the
> >>> next entry with the same index key-value and larger objectStore
> >>> key-value if one exists, or the next entry with a larger index
> >>> key-value otherwise.
> >>>
> >>> Simpler put: For objectStore cursors, the key remembered is the
> >>> objectStores key-value. For index cursors, the key remembered is the
> >>> <index-key-value, objectStore-key-value> tuple.
> >>>
> >>> Let me know what you think.
> >>
> >> This sounds good.  The only problem is how to specify a sort order for
> >> values.  Sorting on the value itself means that structured clones
> themselves
> >> need a sort order.  This would include things like files, regexs,
> images,
> >> etc.  So maybe we want to make the sort order be based on insertion
> order?
> >>  Does anyone have any better ideas?
> > Note that I'm always talking about sorting on key-values. I.e. the
> > value of the keys. So we'll only ever need to define sorting of the
> > types that we accept as keys. So currently none of the types you are
> > mentioning above.
> >
> > Good point.  Got mixed up.  That definitely makes things easier!
> >
> > J
> >
>
Received on Thursday, 15 July 2010 09:04:37 UTC