Re: [IndexDB] Proposal for async API changes from Jonas Sicking on 2010-06-09 (public-webapps@w3.org from April to June 2010)

From: Jonas Sicking <jonas@sicking.cc>
Date: Wed, 9 Jun 2010 15:30:12 -0700
To: Jeremy Orlow <jorlow@chromium.org>
Cc: Shawn Wilsher <sdwilsh@mozilla.com>, Webapps WG <public-webapps@w3.org>
Message-ID: <AANLkTikfVKwphJCBFBlLktumOrzAMYbcD3WakgQ4EaKC@mail.gmail.com>
On Wed, Jun 9, 2010 at 11:40 AM, Jeremy Orlow <jorlow@chromium.org> wrote:
> On Wed, Jun 9, 2010 at 7:25 PM, Jonas Sicking <jonas@sicking.cc> wrote:
>>
>> On Wed, Jun 9, 2010 at 7:42 AM, Jeremy Orlow <jorlow@chromium.org> wrote:
>> > On Tue, May 18, 2010 at 8:34 PM, Jonas Sicking <jonas@sicking.cc> wrote:
>> >>
>> >> On Tue, May 18, 2010 at 12:10 PM, Jeremy Orlow <jorlow@chromium.org>
>> >> wrote:
>> >> > I'm not sure I like the idea of offering sync cursors either since
>> >> > the
>> >> > UA
>> >> > will either need to load everything into memory before starting or
>> >> > risk
>> >> > blocking on disk IO for large data sets.  Thus I'm not sure I support
>> >> > the
>> >> > idea of synchronous cursors.  But, at the same time, I'm concerned
>> >> > about
>> >> > the
>> >> > overhead of firing one event per value with async cursors.  Which is
>> >> > why I
>> >> > was suggesting an interface where the common case (the data is in
>> >> > memory) is
>> >> > done synchronously but the uncommon case (we'd block if we had to
>> >> > respond
>> >> > synchronously) has to be handled since we guarantee that the first
>> >> > time
>> >> > will
>> >> > be forced to be asynchronous.
>> >> > Like I said, I'm not super happy with what I proposed, but I think
>> >> > some
>> >> > hybrid async/sync interface is really what we need.  Have you guys
>> >> > spent
>> >> > any
>> >> > time thinking about something like this?  How dead-set are you on
>> >> > synchronous cursors?
>> >>
>> >> The idea is that synchronous cursors load all the required data into
>> >> memory, yes. I think it would help authors a lot to be able to load
>> >> small chunks of data into memory and read and write to it
>> >> synchronously. Dealing with asynchronous operations constantly is
>> >> certainly possible, but a bit of a pain for authors.
>> >>
>> >> I don't think we should obsess too much about not keeping things in
>> >> memory, we already have things like canvas and the DOM which adds up
>> >> to non-trivial amounts of memory.
>> >>
>> >> Just because data is loaded from a database doesn't mean it's huge.
>> >>
>> >> I do note that you're not as concerned about getAll(), which actually
>> >> have worse memory characteristics than synchronous cursors since you
>> >> need to create the full JS object graph in memory.
>> >
>> > I've been thinking about this off and on since the original proposal was
>> > made, and I just don't feel right about getAll() or synchronous cursors.
>> >  You make some good points about there already being many ways to
>> > overwhelm
>> > ram with webAPIs, but is there any place we make it so easy?  You're
>> > right
>> > that just because it's a database doesn't mean it needs to be huge, but
>> > often times they can get quite big.  And if a developer doesn't spend
>> > time
>> > making sure they test their app with the upper ends of what users may
>> > possibly see, it just seems like this is a recipe for problems.
>> > Here's a concrete example: structured clone allows you to store image
>> > data.
>> >  Lets say I'm building an image hosting site and that I cache all the
>> > images
>> > along with their thumbnails locally in an IndexedDB entity store.  Lets
>> > say
>> > each thumbnail is a trivial amount, but each image is 1MB.  I have an
>> > album
>> > with 1000 images.  I do |var photos =
>> > albumIndex.getAllObjects(albumName);|
>> > and then iterate over that to get the thumbnails.  But I've just loaded
>> > over
>> > 1GB of stuff into ram (assuming no additional inefficiency/blowup).  I
>> > suppose it's possible JavaScript engines could build mechanisms to fetch
>> > this stuff lazily (like you could even with a synchronous cursor) but
>> > that
>> > will take time/effort and introduce lag in the page (while fetching
>> > additional info from disk).
>> >
>> > I'm not completely against the idea of getAll/sync cursors, but I do
>> > think
>> > they should be de-coupled from this proposed API.  I would also suggest
>> > that
>> > we re-consider them only after at least one implementation has normal
>> > cursors working and there's been some experimentation with it.  Until
>> > then,
>> > we're basing most of our arguments on intuition and assumptions.
>>
>> I'm not married to the concept of sync cursors. However I pretty
>> strongly feel that getAll is something we need. If we just allow
>> cursors for getting multiple results I think we'll see an extremely
>> common pattern of people using a cursor to loop through a result set
>> and put values into an array.
>>
>> Yes, it can be misused, but I don't see a reason why people wouldn't
>> misuse a cursor just as much. If they don't think about the fact that
>> a range contains lots of data when using getAll, why would they think
>> about it when using cursors?
>
> Once again, I feel like there is a lot of speculation (more than normal)
> happening here.  I'd prefer we take the Async API without the sync cursors
> or getAll and give the rest of the API some time to bake before considering
> it again.  Ideally by then we'd have at least one or two early adopters that
> can give their perspective on the issue.

If it helps move things forward we can keep getAll out of the spec for
now. I still think that mozilla will keep the implementation though as
to allow people to experiment with it. This will also allow us to
guess less and see how people use it. (Though I'll have to check with
the other mozillians as to what their opinion is).

/ Jonas
Received on Wednesday, 9 June 2010 22:37:50 UTC