Re: [IndexedDB] IDBCursor.update for cursors returned from IDBIndex.openCursor from Jonas Sicking on 2010-09-17 (public-webapps@w3.org from July to September 2010)

From: Jonas Sicking <jonas@sicking.cc>
Date: Fri, 17 Sep 2010 15:14:55 -0700
To: Jeremy Orlow <jorlow@chromium.org>
Cc: public-webapps WG <public-webapps@w3.org>
Message-ID: <AANLkTikCOSHAXeZ+Rk6HKmk8jfv6sbUS3ucAahpUDha4@mail.gmail.com>
On Fri, Sep 17, 2010 at 2:46 AM, Jeremy Orlow <jorlow@chromium.org> wrote:
> On Fri, Sep 17, 2010 at 1:06 AM, Jonas Sicking <jonas@sicking.cc> wrote:
>>
>> On Thu, Sep 16, 2010 at 2:23 PM, Jeremy Orlow <jorlow@chromium.org> wrote:
>> > On Thu, Sep 16, 2010 at 8:53 PM, Jonas Sicking <jonas@sicking.cc> wrote:
>> >>
>> >> On Thu, Sep 16, 2010 at 2:15 AM, Jeremy Orlow <jorlow@chromium.org>
>> >> wrote:
>> >> > Wait a sec.  What are the use cases for non-object cursors anyway?
>> >> >  They
>> >> > made perfect sense back when we allowed explicit index management,
>> >> > but
>> >> > now
>> >> > they kind of seem like a premature optimization or possibly even dead
>> >> > weight.  Maybe we should just remove them altogether?
>> >>
>> >> They are still useful for joins. Consider an objectStore "employees":
>> >>
>> >> { id: 1, name: "Sven", employed: "1-1-2010" }
>> >> { id: 2, name: "Bert", employed: "5-1-2009" }
>> >> { id: 3, name: "Adam", employed: "6-6-2008" }
>> >> And objectStore "sales"
>> >>
>> >> { seller: 1, candyName: "lollipop", quantity: 5, date: "9-15-2010" }
>> >> { seller: 1, candyName: "swedish fish", quantity: 12, date: "9-15-2010"
>> >> }
>> >> { seller: 2, candyName: "jelly belly", quantity: 3, date: "9-14-2010" }
>> >> { seller: 3, candyName: "heath bar", quantity: 3, date: "9-13-2010" }
>> >> If you want to display the amount of sales per person, sorted by names
>> >> of sales person, you could do this by first creating and index for
>> >> "employees" with keyPath "name". You'd then use IDBIndex.openCursor to
>> >> iterate that index, and for each entry find all entries in the "sales"
>> >> objectStore where "seller" matches the cursors .value.
>> >>
>> >> So in this case you don't actually need any data from the "employees"
>> >> objectStore, all the data is available in the index. Thus it is
>> >> sufficient, and faster, to use openCursor than openObjectCursor.
>> >>
>> >> In general, it's a common optimization to stick enough data in an
>> >> index that you don't have to actually look up in the objectStore
>> >> itself. This is slightly less commonly doable since we have relatively
>> >> simple indexes so far. But still doable as the example above shows.
>> >> Once we add support for arrays as keys this will be much more common
>> >> as you can then stick arbitrary data into the index by simply adding
>> >> additional entries to all key arrays. And even more so once we
>> >> (probably in a future version) add support for computed indexes.
>> >
>> >
>> > On Thu, Sep 16, 2010 at 8:57 PM, Jonas Sicking <jonas@sicking.cc> wrote:
>> >>
>> >> On Thu, Sep 16, 2010 at 4:08 AM, Jeremy Orlow <jorlow@chromium.org>
>> >> wrote:
>> >> > Actually, for that matter, are remove and update needed at all?  I
>> >> > think
>> >> > they may just be more cruft left over from the explicit index days.
>> >> >  As
>> >> > far
>> >> > as I can tell, any .delete or .remove should be doable via an
>> >> > objectCursor +
>> >> > .puts/.removes on the objectStore.
>> >>
>> >> They are not strictly needed, but they are a decent convinence
>> >> feature, and with a proper implementation they can even be a
>> >> performance optimization. With a cursor iterating a b-tree you can let
>> >> the cursor keep a pointer to the b-tree entry. They way .delete and
>> >> .update doesn't have to do a b-tree lookup at all.
>> >>
>> >> We're currently not able to do this since our backend (sqlite) doesn't
>> >> have good enough cursor support, but I suspect that this will change
>> >> at some point in the future. In the mean time it seems like a good
>> >> thing to allow people to use API that will be faster in the future.
>> >
>> > All your arguments revolve around what the spec
>> > and implementations might do
>> > in the future.
>>
>> I disagree. The IDBIndex.openCursor example I included uses only
>> existing API, and is a performance improvement in at least our current
>> implementation. Would be interested to hear if it's not a performance
>> improvement in others.
>
> It's not in ours because we join to the ObjectStore's data table either way.
>  But that's not at all why I'm bringing this up.

Why?

>> > Typically we add API surface area only for use cases that
>> > are currently impossible to satisfy or proven performance bottlenecks. I
>> > agree that it's likely implementations will want to do optimizations
>> > like
>> > this in the future, but until they do, it'll be hard to really
>> > understand
>> > the implications and complications that might arrise.
>>
>> That's not entirely true. All the databases I have worked with have
>> had significant performance degradations when having to look up the
>> main table contents rather than simply looking at the contents in the
>> index. I doubt that we'll be able to create a backend where that is
>> not true. So I think we should assume that object cursors are slower
>> than plain cursors.
>
> I agree this is true.
>>
>> Further, I think we should get users on APIs that we are likely to
>> implement with a higher performance. For example, I think sqlite
>> doesn't support having multiple write transactions to the same
>> database, even if those are to different tables.
>
> FWIW: The work around for this is putting each object store in its own
> database.

How do you then guarantee that a transaction that spans multiple
objectStores either fully succeeds or is fully rolled back? Especially
in the event of a crash during commit.

>> Thus the whole API of
>> specifying which objectStores you want to include in a transaction is
>> purely for future optimizations in at least implementations backed by
>> sqlite.
>
> It's funny you mention this because this level of transactions is after
> several iterations of simplifying the design for the exact reasons I'm
> arguing we should simplify the design here.

Indeed, by point remains though, we already have features in there for
faster performance that at least some current implementations can't
optimize and actually give faster performance.

>> I especially think these APIs are worth it given that it's low cost to
>> implement, and adds convenience value to users even if implementations
>> aren't faster yet.
>
> I really don't see much added convenience.  Doing |myCursor.value.id| is
> really not that much harder than |myCursor.value|.  And low cost
> of implementation is a bad reason to add API surface area.

Sorry, I was unclear. I was referring to cursor.update and
cursor.delete. You can accomplish the same thing by calling put/delete
on the objectStore as well, but it's more convenient (and with a
proper implementation, faster) to call it on the cursor directly.

> Given that the key-returning versions of these functions are just
> optimizations, at the very least, we should change the names though:
> get->getKey (or maybe getPrimaryKey?)
> openCursor->openKeyCursor (or maybe openPrimaryKeyCursor?)
> getObject->get
> openObjectCursor->openCursor

I can't say that I feel strongly on the naming issue. Will ask around
here to see how people feel.

/ Jonas
Received on Friday, 17 September 2010 22:15:50 UTC