Re: [IndexDB] Proposal for async API changes from Mikeal Rogers on 2010-06-09 (public-webapps@w3.org from April to June 2010)

From: Mikeal Rogers <mikeal.rogers@gmail.com>
Date: Wed, 9 Jun 2010 14:36:24 -0700
To: Webapps WG <public-webapps@w3.org>
Message-ID: <AANLkTimt4HnRASkL93FwWTnbE9vhnuWjQODrMJ6sf0iB@mail.gmail.com>
I've been looking through the current spec and all the proposed changes.

Great work. I'm going to be building a CouchDB compatible API on top
of IndexedDB that can support peer-to-peer replication without other
CouchDB instances.

One of the things that will entail is a by-sequence index for all the
changes in a give "database" (in my case a database will be scoped to
more than one ObjectStore). In order to accomplish this I'll need to
keep the last known sequence around so that each new write can create
a new entry in the by-sequence index. The problem is that if another
tab/window writes to the database it'll increment that sequence and I
won't be notified so I would have to start every transaction with a
check on the sequence index for the last sequence which seems like a
lot of extra cursor calls.

What I really need is an event listener on an ObjectStore that fires
after a transaction is committed to the store but before the next
transaction is run that gives me information about the commits to the
ObjectStore.

Thoughts?

-Mikeal

On Wed, Jun 9, 2010 at 11:40 AM, Jeremy Orlow <jorlow@chromium.org> wrote:
> On Wed, Jun 9, 2010 at 7:25 PM, Jonas Sicking <jonas@sicking.cc> wrote:
>>
>> On Wed, Jun 9, 2010 at 7:42 AM, Jeremy Orlow <jorlow@chromium.org> wrote:
>> > On Tue, May 18, 2010 at 8:34 PM, Jonas Sicking <jonas@sicking.cc> wrote:
>> >>
>> >> On Tue, May 18, 2010 at 12:10 PM, Jeremy Orlow <jorlow@chromium.org>
>> >> wrote:
>> >> > I'm not sure I like the idea of offering sync cursors either since
>> >> > the
>> >> > UA
>> >> > will either need to load everything into memory before starting or
>> >> > risk
>> >> > blocking on disk IO for large data sets.  Thus I'm not sure I support
>> >> > the
>> >> > idea of synchronous cursors.  But, at the same time, I'm concerned
>> >> > about
>> >> > the
>> >> > overhead of firing one event per value with async cursors.  Which is
>> >> > why I
>> >> > was suggesting an interface where the common case (the data is in
>> >> > memory) is
>> >> > done synchronously but the uncommon case (we'd block if we had to
>> >> > respond
>> >> > synchronously) has to be handled since we guarantee that the first
>> >> > time
>> >> > will
>> >> > be forced to be asynchronous.
>> >> > Like I said, I'm not super happy with what I proposed, but I think
>> >> > some
>> >> > hybrid async/sync interface is really what we need.  Have you guys
>> >> > spent
>> >> > any
>> >> > time thinking about something like this?  How dead-set are you on
>> >> > synchronous cursors?
>> >>
>> >> The idea is that synchronous cursors load all the required data into
>> >> memory, yes. I think it would help authors a lot to be able to load
>> >> small chunks of data into memory and read and write to it
>> >> synchronously. Dealing with asynchronous operations constantly is
>> >> certainly possible, but a bit of a pain for authors.
>> >>
>> >> I don't think we should obsess too much about not keeping things in
>> >> memory, we already have things like canvas and the DOM which adds up
>> >> to non-trivial amounts of memory.
>> >>
>> >> Just because data is loaded from a database doesn't mean it's huge.
>> >>
>> >> I do note that you're not as concerned about getAll(), which actually
>> >> have worse memory characteristics than synchronous cursors since you
>> >> need to create the full JS object graph in memory.
>> >
>> > I've been thinking about this off and on since the original proposal was
>> > made, and I just don't feel right about getAll() or synchronous cursors.
>> >  You make some good points about there already being many ways to
>> > overwhelm
>> > ram with webAPIs, but is there any place we make it so easy?  You're
>> > right
>> > that just because it's a database doesn't mean it needs to be huge, but
>> > often times they can get quite big.  And if a developer doesn't spend
>> > time
>> > making sure they test their app with the upper ends of what users may
>> > possibly see, it just seems like this is a recipe for problems.
>> > Here's a concrete example: structured clone allows you to store image
>> > data.
>> >  Lets say I'm building an image hosting site and that I cache all the
>> > images
>> > along with their thumbnails locally in an IndexedDB entity store.  Lets
>> > say
>> > each thumbnail is a trivial amount, but each image is 1MB.  I have an
>> > album
>> > with 1000 images.  I do |var photos =
>> > albumIndex.getAllObjects(albumName);|
>> > and then iterate over that to get the thumbnails.  But I've just loaded
>> > over
>> > 1GB of stuff into ram (assuming no additional inefficiency/blowup).  I
>> > suppose it's possible JavaScript engines could build mechanisms to fetch
>> > this stuff lazily (like you could even with a synchronous cursor) but
>> > that
>> > will take time/effort and introduce lag in the page (while fetching
>> > additional info from disk).
>> >
>> > I'm not completely against the idea of getAll/sync cursors, but I do
>> > think
>> > they should be de-coupled from this proposed API.  I would also suggest
>> > that
>> > we re-consider them only after at least one implementation has normal
>> > cursors working and there's been some experimentation with it.  Until
>> > then,
>> > we're basing most of our arguments on intuition and assumptions.
>>
>> I'm not married to the concept of sync cursors. However I pretty
>> strongly feel that getAll is something we need. If we just allow
>> cursors for getting multiple results I think we'll see an extremely
>> common pattern of people using a cursor to loop through a result set
>> and put values into an array.
>>
>> Yes, it can be misused, but I don't see a reason why people wouldn't
>> misuse a cursor just as much. If they don't think about the fact that
>> a range contains lots of data when using getAll, why would they think
>> about it when using cursors?
>
> Once again, I feel like there is a lot of speculation (more than normal)
> happening here.  I'd prefer we take the Async API without the sync cursors
> or getAll and give the rest of the API some time to bake before considering
> it again.  Ideally by then we'd have at least one or two early adopters that
> can give their perspective on the issue.
> J
Received on Thursday, 10 June 2010 06:41:42 UTC