Re: [IndexedDB] Callback order from Jeremy Orlow on 2010-07-14 (public-webapps@w3.org from July to September 2010)

From: Jeremy Orlow <jorlow@chromium.org>
Date: Wed, 14 Jul 2010 17:20:59 +0100
To: Jonas Sicking <jonas@sicking.cc>
Cc: Webapps WG <public-webapps@w3.org>
Message-ID: <AANLkTiksbTcUAgkdaR0i4xOMI37IjBP947kjoK3Fh4cS@mail.gmail.com>
On Wed, Jul 14, 2010 at 5:15 PM, Jonas Sicking <jonas@sicking.cc> wrote:

> On Wed, Jul 14, 2010 at 4:16 AM, Jeremy Orlow <jorlow@chromium.org> wrote:
> > On Wed, Jul 7, 2010 at 11:54 PM, Jonas Sicking <jonas@sicking.cc> wrote:
> >>
> >> On Thu, Jun 24, 2010 at 4:40 AM, Jeremy Orlow <jorlow@chromium.org>
> wrote:
> >> > On Sat, Jun 19, 2010 at 9:12 AM, Jonas Sicking <jonas@sicking.cc>
> wrote:
> >> >>
> >> >> On Fri, Jun 18, 2010 at 7:46 PM, Jeremy Orlow <jorlow@chromium.org>
> >> >> wrote:
> >> >> > On Fri, Jun 18, 2010 at 7:24 PM, Jonas Sicking <jonas@sicking.cc>
> >> >> > wrote:
> >> >> >>
> >> >> >> On Fri, Jun 18, 2010 at 7:01 PM, Jeremy Orlow <
> jorlow@chromium.org>
> >> >> >> wrote:
> >> >> >> > I think determinism is most important for the reasons you cited.
> >> >> >> >  I
> >> >> >> > think
> >> >> >> > advanced, performance concerned apps could deal with either
> >> >> >> > semantics
> >> >> >> > you
> >> >> >> > mentioned, so the key would be to pick whatever is best for the
> >> >> >> > normal
> >> >> >> > case.
> >> >> >> >  I'm leaning towards thinking firing in order is the best way to
> >> >> >> > go
> >> >> >> > because
> >> >> >> > it's the most intuitive/easiest to understand, but I don't feel
> >> >> >> > strongly
> >> >> >> > about anything other than being deterministic.
> >> >> >>
> >> >> >> I definitely agree that firing in request order is the simplest,
> >> >> >> both
> >> >> >> from an implementation and usage point of view. However my concern
> >> >> >> is
> >> >> >> that we'd lose most of the performance benefits that cursors
> provide
> >> >> >> if we use that solution.
> >> >> >>
> >> >> >> What do you mean with "apps could deal with either semantics"? You
> >> >> >> mean that they could deal with the cursor case by simply being
> >> >> >> slower,
> >> >> >> or do you mean that they could work around the performance hit
> >> >> >> somehow?
> >> >> >
> >> >> > Hm.  I was thinking they could save the value, call continue, then
> do
> >> >> > work
> >> >> > on it, but that'd of course only defer the slowdown for one
> >> >> > iteration.
> >> >> >  So I
> >> >> > guess they'd have to store up a bunch of data and then make calls
> on
> >> >> > it.
> >> >>
> >> >> Indeed which could be bad for memory footprint.
> >> >>
> >> >> > Of course, they'll run into all of these same issues with the sync
> >> >> > API
> >> >> > since
> >> >> > things are of course done in order.  So maybe trying to optimize
> this
> >> >> > specific case for just the async API is silly?
> >> >>
> >> >> I honestly haven't looked at the sync API. But yes, I assume that it
> >> >> will in general have to serialize all calls into the database and
> thus
> >> >> generally not be as performant. I don't think that is a good reason
> to
> >> >> make the async API slower too though.
> >> >>
> >> >> But it's entirely possible that I'm overly concerned about cursor
> >> >> performance in general though. I won't argue too strongly that we
> need
> >> >> to prioritize cursor callback events until I've seen some numbers. If
> >> >> we want to simply define that callbacks fire in request order for now
> >> >> then that is fine with me.
> >> >
> >> > Yeah, I think we should get some hard numbers and think carefully
> about
> >> > this
> >> > before we make things even more complicated/nuanced.
> >>
> >> I ran some tests. Note that the test implementation is an
> >> approximation. It's both somewhat optimistic in that it doesn't make
> >> the extra effort to ensure that cursor callbacks always run before
> >> other callbacks. But it's also somewhat pessimistic in that it always
> >> returns to the main event loop, even though that is often not needed.
> >> My guess is that in the end it's a pretty close approximation
> >> performance wise.
> >>
> >> I've attached the testcase I used in case anyone want to play around
> >> with it. It contains a fair amount of mozilla specific features
> >> (generators are awesome for asynchronous callbacks) as well as is
> >> written to the IndexedDB API that we currently have implemented, but
> >> it should be portable to other browsers.
> >>
> >> For the currently proposed solution, of always running requests in the
> >> order they are made, including requests coming from cursor.continue(),
> >> gives the following results:
> >>
> >> Plain iteration over 10000 entries using cursor: 2400ms
> >> Iteration over 10000 entries using cursor, performing a join by for
> >> each iteration call getAll on an index: 5400ms
> >>
> >> For the proposed solution of prioritizing cursor.continue() callbacks
> >> over other callbacks:
> >>
> >> Plain iteration over 10000 entries using cursor: 1050ms
> >> Iteration over 10000 entries using cursor, performing a join by for
> >> each iteration call getAll on an index: 1280ms
> >>
> >> The reason that just plain iteration got faster is that we implemented
> >> the strict ordering by sending all requests to the thread the database
> >> runs on, and then having the database thread process all requests in
> >> order and send them back to the requesting thread. So for plain
> >> iteration it basically just means a roundtrip to the indexedDB thread
> >> and back.
> >>
> >> Based on these numbers, I think we should prioritize
> >> IDBCursor.continue() callbacks as for join example this results in a
> >> over 4x speedup.
> >
> > I would like to note that this speedup is on one particular
> implementation
> > which isn't particularly optimized.  Nevertheless, that is a pretty
> > substantial difference in run times.  But yet it just pains me to think
> of
> > special casing the order of execution for just cursors. Especially when
> > we're still trying to nail down the very basics of the async API.
> > I would prefer to open a bug and leave this on the backburner for a while
> > (like other features like nested transactions).  When we do look at this,
> we
> > may want to consider making it an option to run in this mode rather than
> > being the default.  Is that OK with you?  If so, we can open a bug to
> track
> > this but mention in the bug that we're going to hold off for a bit.
> > My biggest take away from all of this is that generators seem cool.  :-)
>
> If you're concerned that this speedup only applies to one particular
> implementation, I'd encourage you to get numbers from other
> implementations ;-)
>
> There is reason to believe that the speedup could be even bigger in a
> multi-process implementation such as the one I imagine that chrome
> requires, since you're serializing cross-process calls rather than the
> cross-thread calls that Firefox is using.
>
> I'd rather not leave this indefinitely open on the backburner as it's
> something that we need to decide one way or another. But if you need
> time to research performance effects then that is of course ok.
>

My entire concern at this point is too much up in the air at once in the
spec and creating complex behaviors that aren't intuitive to developers.  I
hope the former will go away in the next couple of weeks (depends on how
fast we can come to decisions and implement them in the spec).  The latter I
don't have a good answer for.  Time to research perf effects is not one of
my concerns.  (I think you're right that we'll see even more of a chance due
to multi-process latency.)

J
Received on Wednesday, 14 July 2010 16:21:50 UTC