Re: [IndexDB] Proposal for async API changes from Jonas Sicking on 2010-06-09 (public-webapps@w3.org from April to June 2010)

From: Jonas Sicking <jonas@sicking.cc>
Date: Wed, 9 Jun 2010 15:27:46 -0700
To: Laxmi Narsimha Rao Oruganti <Laxmi.Oruganti@microsoft.com>
Cc: Jeremy Orlow <jorlow@chromium.org>, Shawn Wilsher <sdwilsh@mozilla.com>, Webapps WG <public-webapps@w3.org>
Message-ID: <AANLkTilRvKm3YJy96sVUo6_lzepaZoV9Uo2hFI6107uw@mail.gmail.com>
On Wed, Jun 9, 2010 at 11:39 AM, Laxmi Narsimha Rao Oruganti
<Laxmi.Oruganti@microsoft.com> wrote:
> Inline...
>
> -----Original Message-----
> From: public-webapps-request@w3.org [mailto:public-webapps-request@w3.org] On Behalf Of Jonas Sicking
> Sent: Wednesday, June 09, 2010 11:55 PM
> To: Jeremy Orlow
> Cc: Shawn Wilsher; Webapps WG
> Subject: Re: [IndexDB] Proposal for async API changes
>
> On Wed, Jun 9, 2010 at 7:42 AM, Jeremy Orlow <jorlow@chromium.org> wrote:
>> On Tue, May 18, 2010 at 8:34 PM, Jonas Sicking <jonas@sicking.cc> wrote:
>>>
>>> On Tue, May 18, 2010 at 12:10 PM, Jeremy Orlow <jorlow@chromium.org>
>>> wrote:
>>> > I'm not sure I like the idea of offering sync cursors either since the
>>> > UA
>>> > will either need to load everything into memory before starting or risk
>>> > blocking on disk IO for large data sets.  Thus I'm not sure I support
>>> > the
>>> > idea of synchronous cursors.  But, at the same time, I'm concerned about
>>> > the
>>> > overhead of firing one event per value with async cursors.  Which is
>>> > why I
>>> > was suggesting an interface where the common case (the data is in
>>> > memory) is
>>> > done synchronously but the uncommon case (we'd block if we had to
>>> > respond
>>> > synchronously) has to be handled since we guarantee that the first time
>>> > will
>>> > be forced to be asynchronous.
>>> > Like I said, I'm not super happy with what I proposed, but I think some
>>> > hybrid async/sync interface is really what we need.  Have you guys spent
>>> > any
>>> > time thinking about something like this?  How dead-set are you on
>>> > synchronous cursors?
>>>
>>> The idea is that synchronous cursors load all the required data into
>>> memory, yes. I think it would help authors a lot to be able to load
>>> small chunks of data into memory and read and write to it
>>> synchronously. Dealing with asynchronous operations constantly is
>>> certainly possible, but a bit of a pain for authors.
>>>
>>> I don't think we should obsess too much about not keeping things in
>>> memory, we already have things like canvas and the DOM which adds up
>>> to non-trivial amounts of memory.
>>>
>>> Just because data is loaded from a database doesn't mean it's huge.
>>>
>>> I do note that you're not as concerned about getAll(), which actually
>>> have worse memory characteristics than synchronous cursors since you
>>> need to create the full JS object graph in memory.
>>
>> I've been thinking about this off and on since the original proposal was
>> made, and I just don't feel right about getAll() or synchronous cursors.
>>  You make some good points about there already being many ways to overwhelm
>> ram with webAPIs, but is there any place we make it so easy?  You're right
>> that just because it's a database doesn't mean it needs to be huge, but
>> often times they can get quite big.  And if a developer doesn't spend time
>> making sure they test their app with the upper ends of what users may
>> possibly see, it just seems like this is a recipe for problems.
>> Here's a concrete example: structured clone allows you to store image data.
>>  Lets say I'm building an image hosting site and that I cache all the images
>> along with their thumbnails locally in an IndexedDB entity store.  Lets say
>> each thumbnail is a trivial amount, but each image is 1MB.  I have an album
>> with 1000 images.  I do |var photos = albumIndex.getAllObjects(albumName);|
>> and then iterate over that to get the thumbnails.  But I've just loaded over
>> 1GB of stuff into ram (assuming no additional inefficiency/blowup).  I
>> suppose it's possible JavaScript engines could build mechanisms to fetch
>> this stuff lazily (like you could even with a synchronous cursor) but that
>> will take time/effort and introduce lag in the page (while fetching
>> additional info from disk).
>>
>> I'm not completely against the idea of getAll/sync cursors, but I do think
>> they should be de-coupled from this proposed API.  I would also suggest that
>> we re-consider them only after at least one implementation has normal
>> cursors working and there's been some experimentation with it.  Until then,
>> we're basing most of our arguments on intuition and assumptions.
>
> I'm not married to the concept of sync cursors. However I pretty
> strongly feel that getAll is something we need. If we just allow
> cursors for getting multiple results I think we'll see an extremely
> common pattern of people using a cursor to loop through a result set
> and put values into an array.
>
> Yes, it can be misused, but I don't see a reason why people wouldn't
> misuse a cursor just as much. If they don't think about the fact that
> a range contains lots of data when using getAll, why would they think
> about it when using cursors?
>
> [Laxmi] Cursor is a streaming operator that means only the current row or page is available in memory and the rest sits on the disk.  As the program moves the cursor thru the result, old pages are thrown away and new pages are loaded from the result set.  Whereas with getAll everything has to come to memory before returning to the caller.  If there is not enough memory to keep the result all at a time, we would end up in out-of-memory.  In short, getAll suites well for small result/range, but not for big databases.  That is, with getAll we are expecting the people to think and where as with Cursors we don't expect the people to think about the volume/size of the result.

I'm well aware of this. My argument is that I think we'll see people
write code like this:

results = [];
db.objectStore("foo").openCursor(range).onsuccess = function(e) {
  var cursor = e.result;
  if (!cursor) {
    weAreDone(results);
  }
  results.push(cursor.value);
  cursor.continue();
}

While the indexedDB implementation doesn't hold much data in memory at
a time, the webpage will hold just as much as if we had had a getAll
function. Thus we havn't actually improved anything, only forced the
author to write more code.


Put it another way: The raised concern is that people won't think
about the fact that getAll can load a lot of data into memory. And the
proposed solution is to remove the getAll function and tell people to
use openCursor. However if they weren't thinking about that a lot of
data will be in memory at one time, then why wouldn't they write code
like the above? Which results as just as much data being in memory?

/ Jonas
Received on Wednesday, 9 June 2010 22:28:42 UTC