[IndexedDB] Computed indexes

Hi All,

We've debated a bit use cases like storing objects like:

{ name: "Elvis", born: "January 8, 1935", died: "August 16, 1977" }
{ name: "Gustav III", born: "24 January 1746", died: "29 March 1792" }

And create an index based on the age at time of death. Similarly,
store HTML documents and index on the URLs of the outgoing links in
the documents.

The way to solve this use case in the current draft is by creating an
index without a keyPath and then manually insert things into that
index as needed. However this has some significant downsides:

* First of all it's more manual labor for the author to everywhere
where an entry is added also add things to the index.
* It's a bit unclear who is responsible for removing things from the
index. The spec currently says that it's forbidden to add things to
the index which doesn't have a corresponding entry in the objectStore,
however it doesn't forbid removing an entry from the objectStore which
has entries in the index pointing to it. But it also doesn't say that
entries in the index are automatically removed when an entry in the
objectStore is removed.
* If the author is responsible for removing things from the index,
then this could mean having to compute all the index values both on
insertion and on removal into the objectStore.
* Unless the author is prepared to add a lot of logic to his code, it
means that all indexes have to be immediately updated whenever an
insertion/removal takes place. This can be very suboptimal if the
objectStore is modified several times between accesses to a given
index. Compare this to indexes that use a keyPath where the
implementation could lazily only insert things into an index when the
index is accessed.
* It requires all code that inserts and modifies an objectStore to be
aware of all the keyPath-less indexes attached to the objectStore and
how to update them.


We have talked about two solutions to this problem so far:

Upon calls to .add/.put/.update can require that the author passes in
lists of values which are to be used as keys in the various indexes
that are attached to the objectStore. This solves some of the problems
mentioned above, but most of them are not addressed.


We could allow a more complex expression to be used which would be
evaluated for each entry in the objectStore and which produce the set
of keys to be used for that index. This solves all the problems
mentioned above, but has some problems of its own. In particular:

1. We can't allow a JS function to be passed in as that would pull in
all the scope of that function which generally has a lifetime much
shorter than the index. Thus we would require that the expression is
passed in the form of a string which is somewhat ugly.
2. We have to define edge cases around what happens if the expression
modifies the passed in value.
3. There are performance concerns if we want to allow lazy population
of indexes since objects will have to be deserialized from the
objectStore in order to be used by the expression
4. Maciej expressed concern that this might make it impossible to
expose IndexedDB to non-JS languages such as ObjectiveC

Let me address these in order (for the purposes of this discussion
I'll use a separate function to create one of these indexes called
createExpressionIndex. I'd prefer to do the bike-shedding afterward if
possible):
While I think 1 is unfortunate, i don't think it's a big deal.
Fortunately Javascript has the nice feature that it allows functions
to be serialized into strings, which means that while you can't do
myObjectStore.createExpressionIndex("myIndex", function(val) { return .... });
you can do
myObjectStore.createExpressionIndex("myIndex", (function(val) { return
.... }).toString());


2 might not be a big issue if the implementation is using lazy index
population anyway. In this case the implementation will have to
deserialize the value from the objectStore anyway which means that any
modifications to the value won't affect any other indexes or what is
stored in the database.

3 is a very valid concern. We'll have to do measurements on how big of
an issue this is, but my gut instinct is that deserializing can be
made pretty fast.

There are several solutions to 4. The simplest one is to simply expose
the exact same API to ObjectiveC as is exposed to javascript. I.e. if
ObjectiveC calls the createExpressionIndex it too will have to pass in
a string which contains a javascript expression which will be used to
populate indexes. When ObjectiveC simply reads data through an index
it is not affected by javascript executing behind the scenes.
Alternatively, if you want to keep things as pure-ObjectiveC, then we
can define a WebIDL type which maps to a string in the javascript
bindings, and to a callback in ObjectiveC.

All in all I think this is a pretty cool solution. My main concern is
3 listed above, but I'd like to measure to see if this is a really big
problem before tossing this idea out.

Please let me know what you think.

/ Jonas

Received on Thursday, 17 June 2010 22:26:17 UTC