Re: [IndexedDB] Two Real World Use-Cases from Dean Landolt on 2011-03-08 (public-webapps@w3.org from January to March 2011)

From: Dean Landolt <dean@deanlandolt.com>
Date: Tue, 8 Mar 2011 12:54:11 -0500
To: Joran Greef <joran@ronomon.com>
Cc: public-webapps@w3.org
Message-ID: <AANLkTinhfZ9s4RMWvNCUv9ZHc8OU7cnguDD3pR4u22Zf@mail.gmail.com>
On Tue, Mar 8, 2011 at 1:33 AM, Joran Greef <joran@ronomon.com> wrote:

> On 08 Mar 2011, at 7:23 AM, Dean Landolt wrote:
>
> > This doesn't seem right. Assuming your WebSQL implementation had all the
> same indexes isn't it doing pretty much the same things as using separate
> objectStores in IDB? Why would it be an order of magnitude slower? I'm sure
> whatever implementation you're using hasn't seen much optimization but you
> seem to be implying there's something more fundamental? The only thing I can
> think of to blame would be the fat in the objectStore interface -- like, for
> instance, the index building facilities. It seems to me your proposed
> solution is to add yet more fat to the interface (more complex indexing),
> but wouldn't it be just as suitable to instead strip down objectStores to
> their bare essentials to make them more suitable to act as indexes? Then the
> indexing functionality and all the hard decisions could be punted to
> libraries where they'd be free to innovate.
>
> Exactly. It's not what one would expect, and indication of the poor state
> of the IDB implementation (which is essentially a wrapper around SQLite
> anyway).
>
>
Which implementation? Why do you think it's a wrapper around SQLite? I doubt
it could be implemented efficiently this way (due to its schema-free
nature), so that would explain your benchmarks. But why would you judge the
spec on one poor implementation?



> If someone is advising that object stores be used to handle indexes then
> may I be the first to raise a red flag and say that IDB is failing us (and
> it would have been better for the spec team to provide a locking mechanism
> for LocalStorage so it could be used in that way).


This is hyperbole. The critical feature IDB gives us is efficient range
retrieval -- try that with LocalStorage.


> The whole point of IDB as far as I can see is to provide transactional
> indexed access to a key value store.
>

You say "indexed", I say "ordered". An objectStore is more than a kv store
-- the keys are stored and traversed in order. This is the win, what makes
IDB objectStores so special. This also makes them look an awful lot like
indexes too!

(Which reminds me: last time I checked collation is still up in the air --
this could be very problematic for interop. Anyone know of any plans to
correct this in the first version?)


>
> > Why? You wouldn't necessarily have to store the whole object in each
> index, just the index key, a value and some pointer to the original source
> object. Something to resolve this pointer to the source would need to be
> spec'd (a la couchdb's include_docs), but that's simple. Even better, say it
> were possible to define a link relation on an object store that can resolve
> to its source object -- you could define a source link relation and the
> property to use -- and this would have the added bonus of being more broadly
> applicable than just linking an index record to its source instance.
>
> Think of the object creation and JSON serialization/deserialization
> overhead for putting 50 indexes and you have got more than enough waste
> there already.
>

How does your proposal avoid this?


>
> > We can fix all of this right now very simply:
> >
> > 1. Enable objectStore.put and objectStore.delete to accept a setIndexes
> option and an unsetIndexes option. The value passed for either option would
> be an array (string list) of index references.
> >
> > This would only work for indexes arrays of strings, right? Things can get
> much more complicated than that, and when they do you'd have to use an
> objectStore to do your indexing anyway, right?
>
> No it would work for pretty much anything. The application would be free to
> determine the indexes, and also to convert query parameters into indexes
> when querying. It's essentially "computed indexes" without the hassles of
> IDB trying to do it (there was an interesting thread last year on the
> challenges of storing am index computing function in IDB).
>
> > Why is it more theoretically performant than using objectStores in the
> raw?
>
> It's a more direct interface. Think about it for a second. Using
> objectStores in the raw is interpolating O(n) complexity with multiple
> function calls, to give just one reason.
>

Huh? If an objectStore is backed by something like a BDB btree, as is
implied by the design of the spec, retrieval ought to be O(log base_n) where
base_n is the average page size. Writing would have O(n) complexity where n
is the number of indexes, but the same is true for your proposal, right?


> If IDB can receive a list of indexes to add and remove an object to and
> from, then it can also do things like perform a set difference first to save
> unnecessary IO.
>

You lost me here -- when you open a cursor on an index when and why would it
need to do relational operations?


> I have written a database or two with this technique and it's certainly
> faster.
>
> > I don't necessarily understand the stateful vs. stateless distinction
> here. I don't see how your proposed solution removes the requirement for IDB
> to enforce constraints when certain indexes are present. Developers would
> already be able to use IDB statefully (with predefined schemas) -- they'd
> just use a library that has a schema mechanism. I doubt such a library for
> IDB already exists, but it'd be quite easy to port perstore, for instance,
> which is derived from the IDB API and already has this functionality using
> json-schema. There will no doubt be many ORM-like libraries that will pop up
> as soon as IDB starts to stabilize (or as soon as it gets a node.js
> implementation).
>
> The trouble is you always think a database would "be quite easy" until you
> actually try to do it yourself.
>

Heh. The only databases I've written have been fairly trivial, so I'm going
to have to defer to you're experience on this one :)


> At first when I dug into IDB I didn't think there would be any problems
> that could not be handled in some way. I have actually switched back to
> WebSQL now and will encourage my users to use Safari or Chrome as long as
> these browsers support WebSQL (and I hope Chrome will at least finish up by
> adding a quota interface for WebSQL). IDB right now is like a completely
> neutered slower SQLite without any of the benefits to be expected of a
> transactional indexed KV store. It's really sad.
>

I suspect the fact that you're coming from SQLite and expecting a similar
feature set is coloring your opinion. IDB is *way* lower level than SQLite
-- it's effectively *just* the SQLite indexes (as Keean pointed out, SQLite
uses BDB as well). Even without relational operators it shouldn't be too
difficult to do multidimensional queries. To do this efficiently you need
stats to build a query plan (determine which order to use indexes) more than
you need relational operators. Of course, relational operators wouldn't
hurt, but you'd still need a query plan to use them effectively.

For examples of stateless databases see the interfaces for Redis (the best
> example, and a perfect target for IDB), Berkeley, Tokyo. For a statefull
> database see MySql (and read this by Bret Taylor on the subject
> http://bret.appspot.com/entry/how-friendfeed-uses-mysql). I can understand
> how IDB just inherited this idea of pre-defined indexes from SQL. But I
> think it's an assumption that must be challenged given the complexity it
> involves and the greater power, flexibility, and simplicity to be had from a
> stateless database.
>
> > ISTM giving library authors the freedom and flexibility to control their
> own indexes would be a huge win. They already have much of what they need fo
> this (though there are still a few gaps) but complicating the indexing
> without actually solving the problems would only serve to hamper users. If
> it's easy to implement, great, but I'm still left wondering why maintaining
> your own indexes is so slow -- this seems like the use case for IDB to
> really nail.
>
> I think we both want the same thing. Making IDB stateless is the best step
> towards providing something flexible that library authors can work on top
> of. But this does not appear to be the current goal of IDB, which wants to
> try and tackle things like application state, computing indexes, migrations,
> the whole shebang (all of which seems to be becoming more and more the
> jurisdiction of the application), instead of directly addressing the
> original goal of providing a transactional indexed key value store. IDB is
> about as high-level as any low-level API could be right now.
>
>
I think you're right -- we want pretty much the same thing -- but I'm not
sure I completely grep your proposal. I originally misinterpreted what you
wrote as implying keypaths rather than index names. Are you suggesting
something like BDB's `associate` [1], except without having to explicitly
create and reference the secondary store?

I suspect not, because the callback would be functionally equivalent to a
stored procedure, an idea you dismissed. None the less, it should be clear
from the BDB associate API that there's no real distinction between *object
stores* and *indexes *-- and thus, no reason why IDB has to have one either.
Further, this automatic indexing API is nothing more than sugar -- the same
could be done manually, and just as efficiently. Why is this not true for
IDB? Why would you consider parity with BDB a failure of the API?
Received on Tuesday, 8 March 2011 17:54:46 UTC