Re: [IndexedDB] Callback order from Jonas Sicking on 2010-07-14 (public-webapps@w3.org from July to September 2010)

From: Jonas Sicking <jonas@sicking.cc>
Date: Wed, 14 Jul 2010 09:15:59 -0700
To: Jeremy Orlow <jorlow@chromium.org>
Cc: Webapps WG <public-webapps@w3.org>
Message-ID: <AANLkTilzn1GKrbWM_fV1dII6XVCPkO6qpsyXvoB7oble@mail.gmail.com>
On Wed, Jul 14, 2010 at 4:16 AM, Jeremy Orlow <jorlow@chromium.org> wrote:
> On Wed, Jul 7, 2010 at 11:54 PM, Jonas Sicking <jonas@sicking.cc> wrote:
>>
>> On Thu, Jun 24, 2010 at 4:40 AM, Jeremy Orlow <jorlow@chromium.org> wrote:
>> > On Sat, Jun 19, 2010 at 9:12 AM, Jonas Sicking <jonas@sicking.cc> wrote:
>> >>
>> >> On Fri, Jun 18, 2010 at 7:46 PM, Jeremy Orlow <jorlow@chromium.org>
>> >> wrote:
>> >> > On Fri, Jun 18, 2010 at 7:24 PM, Jonas Sicking <jonas@sicking.cc>
>> >> > wrote:
>> >> >>
>> >> >> On Fri, Jun 18, 2010 at 7:01 PM, Jeremy Orlow <jorlow@chromium.org>
>> >> >> wrote:
>> >> >> > I think determinism is most important for the reasons you cited.
>> >> >> >  I
>> >> >> > think
>> >> >> > advanced, performance concerned apps could deal with either
>> >> >> > semantics
>> >> >> > you
>> >> >> > mentioned, so the key would be to pick whatever is best for the
>> >> >> > normal
>> >> >> > case.
>> >> >> >  I'm leaning towards thinking firing in order is the best way to
>> >> >> > go
>> >> >> > because
>> >> >> > it's the most intuitive/easiest to understand, but I don't feel
>> >> >> > strongly
>> >> >> > about anything other than being deterministic.
>> >> >>
>> >> >> I definitely agree that firing in request order is the simplest,
>> >> >> both
>> >> >> from an implementation and usage point of view. However my concern
>> >> >> is
>> >> >> that we'd lose most of the performance benefits that cursors provide
>> >> >> if we use that solution.
>> >> >>
>> >> >> What do you mean with "apps could deal with either semantics"? You
>> >> >> mean that they could deal with the cursor case by simply being
>> >> >> slower,
>> >> >> or do you mean that they could work around the performance hit
>> >> >> somehow?
>> >> >
>> >> > Hm.  I was thinking they could save the value, call continue, then do
>> >> > work
>> >> > on it, but that'd of course only defer the slowdown for one
>> >> > iteration.
>> >> >  So I
>> >> > guess they'd have to store up a bunch of data and then make calls on
>> >> > it.
>> >>
>> >> Indeed which could be bad for memory footprint.
>> >>
>> >> > Of course, they'll run into all of these same issues with the sync
>> >> > API
>> >> > since
>> >> > things are of course done in order.  So maybe trying to optimize this
>> >> > specific case for just the async API is silly?
>> >>
>> >> I honestly haven't looked at the sync API. But yes, I assume that it
>> >> will in general have to serialize all calls into the database and thus
>> >> generally not be as performant. I don't think that is a good reason to
>> >> make the async API slower too though.
>> >>
>> >> But it's entirely possible that I'm overly concerned about cursor
>> >> performance in general though. I won't argue too strongly that we need
>> >> to prioritize cursor callback events until I've seen some numbers. If
>> >> we want to simply define that callbacks fire in request order for now
>> >> then that is fine with me.
>> >
>> > Yeah, I think we should get some hard numbers and think carefully about
>> > this
>> > before we make things even more complicated/nuanced.
>>
>> I ran some tests. Note that the test implementation is an
>> approximation. It's both somewhat optimistic in that it doesn't make
>> the extra effort to ensure that cursor callbacks always run before
>> other callbacks. But it's also somewhat pessimistic in that it always
>> returns to the main event loop, even though that is often not needed.
>> My guess is that in the end it's a pretty close approximation
>> performance wise.
>>
>> I've attached the testcase I used in case anyone want to play around
>> with it. It contains a fair amount of mozilla specific features
>> (generators are awesome for asynchronous callbacks) as well as is
>> written to the IndexedDB API that we currently have implemented, but
>> it should be portable to other browsers.
>>
>> For the currently proposed solution, of always running requests in the
>> order they are made, including requests coming from cursor.continue(),
>> gives the following results:
>>
>> Plain iteration over 10000 entries using cursor: 2400ms
>> Iteration over 10000 entries using cursor, performing a join by for
>> each iteration call getAll on an index: 5400ms
>>
>> For the proposed solution of prioritizing cursor.continue() callbacks
>> over other callbacks:
>>
>> Plain iteration over 10000 entries using cursor: 1050ms
>> Iteration over 10000 entries using cursor, performing a join by for
>> each iteration call getAll on an index: 1280ms
>>
>> The reason that just plain iteration got faster is that we implemented
>> the strict ordering by sending all requests to the thread the database
>> runs on, and then having the database thread process all requests in
>> order and send them back to the requesting thread. So for plain
>> iteration it basically just means a roundtrip to the indexedDB thread
>> and back.
>>
>> Based on these numbers, I think we should prioritize
>> IDBCursor.continue() callbacks as for join example this results in a
>> over 4x speedup.
>
> I would like to note that this speedup is on one particular implementation
> which isn't particularly optimized.  Nevertheless, that is a pretty
> substantial difference in run times.  But yet it just pains me to think of
> special casing the order of execution for just cursors. Especially when
> we're still trying to nail down the very basics of the async API.
> I would prefer to open a bug and leave this on the backburner for a while
> (like other features like nested transactions).  When we do look at this, we
> may want to consider making it an option to run in this mode rather than
> being the default.  Is that OK with you?  If so, we can open a bug to track
> this but mention in the bug that we're going to hold off for a bit.
> My biggest take away from all of this is that generators seem cool.  :-)

If you're concerned that this speedup only applies to one particular
implementation, I'd encourage you to get numbers from other
implementations ;-)

There is reason to believe that the speedup could be even bigger in a
multi-process implementation such as the one I imagine that chrome
requires, since you're serializing cross-process calls rather than the
cross-thread calls that Firefox is using.

I'd rather not leave this indefinitely open on the backburner as it's
something that we need to decide one way or another. But if you need
time to research performance effects then that is of course ok.

/ Jonas
Received on Wednesday, 14 July 2010 16:17:08 UTC