Re: [IndexDB] Proposal for async API changes from Jonas Sicking on 2010-06-09 (public-webapps@w3.org from April to June 2010)

From: Jonas Sicking <jonas@sicking.cc>
Date: Wed, 9 Jun 2010 11:25:05 -0700
To: Jeremy Orlow <jorlow@chromium.org>
Cc: Shawn Wilsher <sdwilsh@mozilla.com>, Webapps WG <public-webapps@w3.org>
Message-ID: <AANLkTikxufeJMAmtc8-hCSmxYj8mnhg8bNfwXBFN28k3@mail.gmail.com>
On Wed, Jun 9, 2010 at 7:42 AM, Jeremy Orlow <jorlow@chromium.org> wrote:
> On Tue, May 18, 2010 at 8:34 PM, Jonas Sicking <jonas@sicking.cc> wrote:
>>
>> On Tue, May 18, 2010 at 12:10 PM, Jeremy Orlow <jorlow@chromium.org>
>> wrote:
>> > I'm not sure I like the idea of offering sync cursors either since the
>> > UA
>> > will either need to load everything into memory before starting or risk
>> > blocking on disk IO for large data sets.  Thus I'm not sure I support
>> > the
>> > idea of synchronous cursors.  But, at the same time, I'm concerned about
>> > the
>> > overhead of firing one event per value with async cursors.  Which is
>> > why I
>> > was suggesting an interface where the common case (the data is in
>> > memory) is
>> > done synchronously but the uncommon case (we'd block if we had to
>> > respond
>> > synchronously) has to be handled since we guarantee that the first time
>> > will
>> > be forced to be asynchronous.
>> > Like I said, I'm not super happy with what I proposed, but I think some
>> > hybrid async/sync interface is really what we need.  Have you guys spent
>> > any
>> > time thinking about something like this?  How dead-set are you on
>> > synchronous cursors?
>>
>> The idea is that synchronous cursors load all the required data into
>> memory, yes. I think it would help authors a lot to be able to load
>> small chunks of data into memory and read and write to it
>> synchronously. Dealing with asynchronous operations constantly is
>> certainly possible, but a bit of a pain for authors.
>>
>> I don't think we should obsess too much about not keeping things in
>> memory, we already have things like canvas and the DOM which adds up
>> to non-trivial amounts of memory.
>>
>> Just because data is loaded from a database doesn't mean it's huge.
>>
>> I do note that you're not as concerned about getAll(), which actually
>> have worse memory characteristics than synchronous cursors since you
>> need to create the full JS object graph in memory.
>
> I've been thinking about this off and on since the original proposal was
> made, and I just don't feel right about getAll() or synchronous cursors.
>  You make some good points about there already being many ways to overwhelm
> ram with webAPIs, but is there any place we make it so easy?  You're right
> that just because it's a database doesn't mean it needs to be huge, but
> often times they can get quite big.  And if a developer doesn't spend time
> making sure they test their app with the upper ends of what users may
> possibly see, it just seems like this is a recipe for problems.
> Here's a concrete example: structured clone allows you to store image data.
>  Lets say I'm building an image hosting site and that I cache all the images
> along with their thumbnails locally in an IndexedDB entity store.  Lets say
> each thumbnail is a trivial amount, but each image is 1MB.  I have an album
> with 1000 images.  I do |var photos = albumIndex.getAllObjects(albumName);|
> and then iterate over that to get the thumbnails.  But I've just loaded over
> 1GB of stuff into ram (assuming no additional inefficiency/blowup).  I
> suppose it's possible JavaScript engines could build mechanisms to fetch
> this stuff lazily (like you could even with a synchronous cursor) but that
> will take time/effort and introduce lag in the page (while fetching
> additional info from disk).
>
> I'm not completely against the idea of getAll/sync cursors, but I do think
> they should be de-coupled from this proposed API.  I would also suggest that
> we re-consider them only after at least one implementation has normal
> cursors working and there's been some experimentation with it.  Until then,
> we're basing most of our arguments on intuition and assumptions.

I'm not married to the concept of sync cursors. However I pretty
strongly feel that getAll is something we need. If we just allow
cursors for getting multiple results I think we'll see an extremely
common pattern of people using a cursor to loop through a result set
and put values into an array.

Yes, it can be misused, but I don't see a reason why people wouldn't
misuse a cursor just as much. If they don't think about the fact that
a range contains lots of data when using getAll, why would they think
about it when using cursors?

/ Jonas
Received on Wednesday, 9 June 2010 18:30:59 UTC