Re: [IndexedDB] Computed indexes from Jeremy Orlow on 2010-06-19 (public-webapps@w3.org from April to June 2010)

From: Jeremy Orlow <jorlow@chromium.org>
Date: Fri, 18 Jun 2010 19:42:13 -0700
To: Jonas Sicking <jonas@sicking.cc>
Cc: Webapps WG <public-webapps@w3.org>
Message-ID: <AANLkTinrPItwZmAyuNOopAEUOZxAIZvJXQS5XtngM0bY@mail.gmail.com>
On Thu, Jun 17, 2010 at 3:25 PM, Jonas Sicking <jonas@sicking.cc> wrote:

> Hi All,
>
> We've debated a bit use cases like storing objects like:
>
> { name: "Elvis", born: "January 8, 1935", died: "August 16, 1977" }
> { name: "Gustav III", born: "24 January 1746", died: "29 March 1792" }
>
> And create an index based on the age at time of death. Similarly,
> store HTML documents and index on the URLs of the outgoing links in
> the documents.
>
> The way to solve this use case in the current draft is by creating an
> index without a keyPath and then manually insert things into that
> index as needed. However this has some significant downsides:
>
> * First of all it's more manual labor for the author to everywhere
> where an entry is added also add things to the index.
> * It's a bit unclear who is responsible for removing things from the
> index. The spec currently says that it's forbidden to add things to
> the index which doesn't have a corresponding entry in the objectStore,
> however it doesn't forbid removing an entry from the objectStore which
> has entries in the index pointing to it. But it also doesn't say that
> entries in the index are automatically removed when an entry in the
> objectStore is removed.
> * If the author is responsible for removing things from the index,
> then this could mean having to compute all the index values both on
> insertion and on removal into the objectStore.
> * Unless the author is prepared to add a lot of logic to his code, it
> means that all indexes have to be immediately updated whenever an
> insertion/removal takes place. This can be very suboptimal if the
> objectStore is modified several times between accesses to a given
> index. Compare this to indexes that use a keyPath where the
> implementation could lazily only insert things into an index when the
> index is accessed.
> * It requires all code that inserts and modifies an objectStore to be
> aware of all the keyPath-less indexes attached to the objectStore and
> how to update them.
>

You missed one: if your index is non-unique, then the only way to update it
is by using a cursor...which is even more async calls for every update
and/or much more complicated logic to batch them up.


> We have talked about two solutions to this problem so far:
>
> Upon calls to .add/.put/.update can require that the author passes in
> lists of values which are to be used as keys in the various indexes
> that are attached to the objectStore. This solves some of the problems
> mentioned above, but most of them are not addressed.
>

Which ones are not addressed?  I'm not sure I agree.


> We could allow a more complex expression to be used which would be
> evaluated for each entry in the objectStore and which produce the set
> of keys to be used for that index. This solves all the problems
> mentioned above, but has some problems of its own. In particular:
>
> 1. We can't allow a JS function to be passed in as that would pull in
> all the scope of that function which generally has a lifetime much
> shorter than the index. Thus we would require that the expression is
> passed in the form of a string which is somewhat ugly.
> 2. We have to define edge cases around what happens if the expression
> modifies the passed in value.
> 3. There are performance concerns if we want to allow lazy population
> of indexes since objects will have to be deserialized from the
> objectStore in order to be used by the expression
>

Really?  Why would people be doing this lazily?  It seems like the next time
you access the array, you have to do this work.  I really don't see what
you'd save by doing this lazily.  Am I missing something?


> 4. Maciej expressed concern that this might make it impossible to
> expose IndexedDB to non-JS languages such as ObjectiveC
>
> Let me address these in order (for the purposes of this discussion
> I'll use a separate function to create one of these indexes called
> createExpressionIndex. I'd prefer to do the bike-shedding afterward if
> possible):
> While I think 1 is unfortunate, i don't think it's a big deal.
> Fortunately Javascript has the nice feature that it allows functions
> to be serialized into strings, which means that while you can't do
> myObjectStore.createExpressionIndex("myIndex", function(val) { return ....
> });
> you can do
> myObjectStore.createExpressionIndex("myIndex", (function(val) { return
> .... }).toString());
>

Do most JS engines allow _any_ function to be stored away for later?  If so,
why doesn't the structured clone algorithm allow them?  Maybe we could make
the global scope be empty when compiling (so variables can't be bound) and
executing the function?

What is the difference between an "expressionIndex" and a keyPath?  It seems
like they're doing about the same thing.  My proposal to allow keyPath to be
artibrary JavaScript and then have the value being inserted be the global
scope actually sounds almost identical to what you're doing....except more
in the relm of what JS engines already do (since it's much like an eval).

2 might not be a big issue if the implementation is using lazy index
> population anyway. In this case the implementation will have to
> deserialize the value from the objectStore anyway which means that any
> modifications to the value won't affect any other indexes or what is
> stored in the database.
>

But it forces you to create a clone before hand...or somehow implement copy
on write semantics.


> 3 is a very valid concern. We'll have to do measurements on how big of
> an issue this is, but my gut instinct is that deserializing can be
> made pretty fast.
>
> There are several solutions to 4. The simplest one is to simply expose
> the exact same API to ObjectiveC as is exposed to javascript. I.e. if
> ObjectiveC calls the createExpressionIndex it too will have to pass in
> a string which contains a javascript expression which will be used to
> populate indexes. When ObjectiveC simply reads data through an index
> it is not affected by javascript executing behind the scenes.
> Alternatively, if you want to keep things as pure-ObjectiveC, then we
> can define a WebIDL type which maps to a string in the javascript
> bindings, and to a callback in ObjectiveC.
>

I don't quite understand you here, but forcing all IndexedDB implementations
to depend on JavaScript seems to be a non-starter.  Having any language take
in its own native function is an interesting idea, but then each IndexedDB
database is tied to whatever language it was created in which seems a little
odd--but maybe OK in practice.


> All in all I think this is a pretty cool solution. My main concern is
> 3 listed above, but I'd like to measure to see if this is a really big
> problem before tossing this idea out.
>
> Please let me know what you think.
>

I only see superficial differences between this and my keyPath proposals
that I made a while ago.  Am I missing something?

J
Received on Saturday, 19 June 2010 02:43:03 UTC