Re: [IndexedDB] Two Real World Use-Cases from Keean Schupke on 2011-03-08 (public-webapps@w3.org from January to March 2011)

From: Keean Schupke <keean@fry-it.com>
Date: Tue, 8 Mar 2011 09:04:15 +0000
To: Joran Greef <joran@ronomon.com>
Cc: Dean Landolt <dean@deanlandolt.com>, public-webapps@w3.org
Message-ID: <AANLkTi=Gjfu_tnsq75KdLETGecGWV=T-jrMoTLA2+FD9@mail.gmail.com>
On 8 March 2011 06:33, Joran Greef <joran@ronomon.com> wrote:

> On 08 Mar 2011, at 7:23 AM, Dean Landolt wrote:
>
> > This doesn't seem right. Assuming your WebSQL implementation had all the
> same indexes isn't it doing pretty much the same things as using separate
> objectStores in IDB? Why would it be an order of magnitude slower? I'm sure
> whatever implementation you're using hasn't seen much optimization but you
> seem to be implying there's something more fundamental? The only thing I can
> think of to blame would be the fat in the objectStore interface -- like, for
> instance, the index building facilities. It seems to me your proposed
> solution is to add yet more fat to the interface (more complex indexing),
> but wouldn't it be just as suitable to instead strip down objectStores to
> their bare essentials to make them more suitable to act as indexes? Then the
> indexing functionality and all the hard decisions could be punted to
> libraries where they'd be free to innovate.
>
> Exactly. It's not what one would expect, and indication of the poor state
> of the IDB implementation (which is essentially a wrapper around SQLite
> anyway).
>
> If someone is advising that object stores be used to handle indexes then
> may I be the first to raise a red flag and say that IDB is failing us (and
> it would have been better for the spec team to provide a locking mechanism
> for LocalStorage so it could be used in that way). The whole point of IDB as
> far as I can see is to provide transactional indexed access to a key value
> store.
>
> > Why? You wouldn't necessarily have to store the whole object in each
> index, just the index key, a value and some pointer to the original source
> object. Something to resolve this pointer to the source would need to be
> spec'd (a la couchdb's include_docs), but that's simple. Even better, say it
> were possible to define a link relation on an object store that can resolve
> to its source object -- you could define a source link relation and the
> property to use -- and this would have the added bonus of being more broadly
> applicable than just linking an index record to its source instance.
>
> Think of the object creation and JSON serialization/deserialization
> overhead for putting 50 indexes and you have got more than enough waste
> there already.
>
> > We can fix all of this right now very simply:
> >
> > 1. Enable objectStore.put and objectStore.delete to accept a setIndexes
> option and an unsetIndexes option. The value passed for either option would
> be an array (string list) of index references.
> >
> > This would only work for indexes arrays of strings, right? Things can get
> much more complicated than that, and when they do you'd have to use an
> objectStore to do your indexing anyway, right?
>
> No it would work for pretty much anything. The application would be free to
> determine the indexes, and also to convert query parameters into indexes
> when querying. It's essentially "computed indexes" without the hassles of
> IDB trying to do it (there was an interesting thread last year on the
> challenges of storing am index computing function in IDB).
>
> > Why is it more theoretically performant than using objectStores in the
> raw?
>
> It's a more direct interface. Think about it for a second. Using
> objectStores in the raw is interpolating O(n) complexity with multiple
> function calls, to give just one reason. If IDB can receive a list of
> indexes to add and remove an object to and from, then it can also do things
> like perform a set difference first to save unnecessary IO. I have written a
> database or two with this technique and it's certainly faster.
>
> > I don't necessarily understand the stateful vs. stateless distinction
> here. I don't see how your proposed solution removes the requirement for IDB
> to enforce constraints when certain indexes are present. Developers would
> already be able to use IDB statefully (with predefined schemas) -- they'd
> just use a library that has a schema mechanism. I doubt such a library for
> IDB already exists, but it'd be quite easy to port perstore, for instance,
> which is derived from the IDB API and already has this functionality using
> json-schema. There will no doubt be many ORM-like libraries that will pop up
> as soon as IDB starts to stabilize (or as soon as it gets a node.js
> implementation).
>
> The trouble is you always think a database would "be quite easy" until you
> actually try to do it yourself. At first when I dug into IDB I didn't think
> there would be any problems that could not be handled in some way. I have
> actually switched back to WebSQL now and will encourage my users to use
> Safari or Chrome as long as these browsers support WebSQL (and I hope Chrome
> will at least finish up by adding a quota interface for WebSQL). IDB right
> now is like a completely neutered slower SQLite without any of the benefits
> to be expected of a transactional indexed KV store. It's really sad.
>
> For examples of stateless databases see the interfaces for Redis (the best
> example, and a perfect target for IDB), Berkeley, Tokyo. For a statefull
> database see MySql (and read this by Bret Taylor on the subject
> http://bret.appspot.com/entry/how-friendfeed-uses-mysql). I can understand
> how IDB just inherited this idea of pre-defined indexes from SQL. But I
> think it's an assumption that must be challenged given the complexity it
> involves and the greater power, flexibility, and simplicity to be had from a
> stateless database.
>
> > ISTM giving library authors the freedom and flexibility to control their
> own indexes would be a huge win. They already have much of what they need fo
> this (though there are still a few gaps) but complicating the indexing
> without actually solving the problems would only serve to hamper users. If
> it's easy to implement, great, but I'm still left wondering why maintaining
> your own indexes is so slow -- this seems like the use case for IDB to
> really nail.
>
> I think we both want the same thing. Making IDB stateless is the best step
> towards providing something flexible that library authors can work on top
> of. But this does not appear to be the current goal of IDB, which wants to
> try and tackle things like application state, computing indexes, migrations,
> the whole shebang (all of which seems to be becoming more and more the
> jurisdiction of the application), instead of directly addressing the
> original goal of providing a transactional indexed key value store. IDB is
> about as high-level as any low-level API could be right now.
>
>
>
I agree that the feature set of BerkeleyDB should be the target of something
like IDB. However your comments about the relationship between SQL and
BerkeleyDB don't make sense. For example SQLite uses BerkeleyDB to maintain
its indexes. SQL is a query language and has nothing to do with the
technology used to implement the database (indexes or table store). As such
SQL represents a higher level (and more powerful) abstraction of the data,
so whilst you could say BerkeleyDB offers more simplicity and flexibility,
it is not more powerful than a relational database. Power in this case
refers to the amount that can be done with a command. How many low level
BerkeleyDB index operations would it take to do the equivalent of: "select
B.z from A, B where A.x = B.y group A.x" (note this statement is only valid
if B.y is 'unique' in table B, otherwise it should generate an error, as to
be valid there must be a functional-dependency between from A.x to B.z -
With a schema, a quick check of the column properties is enough to check
this precondition before executing the statement. Without a schema we have
to check B.y for uniqueness as we execute the command to know whether the
results are valid).

To me the most worrying thing is the "Not-Invented-Here" syndrome. BerkelyDB
is now in its fifth major revision. This shows that it is not easy to get
these kind of APIs correct. Would it not have been sensible to take the API
of something like BerkeleyDB and create a JavaScript version of it?
Implementers would then be able to implement a thin wrapper around the
existing library and get a fast and well designed API.


Cheers,
Keean.
Received on Tuesday, 8 March 2011 09:04:48 UTC