Re: [IndexedDB] Computed indexes from Jonas Sicking on 2010-06-26 (public-webapps@w3.org from April to June 2010)

From: Jonas Sicking <jonas@sicking.cc>
Date: Sat, 26 Jun 2010 04:11:11 -0700
To: Jeremy Orlow <jorlow@chromium.org>
Cc: Webapps WG <public-webapps@w3.org>
Message-ID: <AANLkTinZzD7OI1AhVAjxLcpnqQgRlL27nq6gxJ6A8wdw@mail.gmail.com>
On Thu, Jun 24, 2010 at 7:01 AM, Jeremy Orlow <jorlow@chromium.org> wrote:
>> >> 4. Maciej expressed concern that this might make it impossible to
>> >> expose IndexedDB to non-JS languages such as ObjectiveC
>> >>
>> >> Let me address these in order (for the purposes of this discussion
>> >> I'll use a separate function to create one of these indexes called
>> >> createExpressionIndex. I'd prefer to do the bike-shedding afterward if
>> >> possible):
>> >> While I think 1 is unfortunate, i don't think it's a big deal.
>> >> Fortunately Javascript has the nice feature that it allows functions
>> >> to be serialized into strings, which means that while you can't do
>> >> myObjectStore.createExpressionIndex("myIndex", function(val) { return
>> >> ....
>> >> });
>> >> you can do
>> >> myObjectStore.createExpressionIndex("myIndex", (function(val) { return
>> >> .... }).toString());
>> >
>> > Do most JS engines allow _any_ function to be stored away for later?
>>
>> If you mean if _any_ function can be serialized using toString(), then
>> yes, I believe so. Though of course serializing loses all scope
>> references so it's not guaranteed to work. And non-builtin functions.
>> Functions like Array.sort or Document.getElementById can of course not
>> be serialized.
>
> This may be true of JavaScript and even many other interpreted languages,
> but it's not true of many (most?) compiled languages.  How would we handle
> this in the spec?  Would we say it's unsupported in v1 or say that such
> implementations would define their own language (or use an existing
> interpreted language) as they see fit?  I guess this isn't a deal breaker
> since JavaScript really is the main target for this, but it's somewhat
> unfortunate.

Like I said in an earlier email in this thread, we can fairly easily
solve this using WebIDL. All that we really need here is a callback
which can be called after browser restarts. Javascript doesn't provide
any such means, which is why we have to resort to passing in a string.
However in for example Java I believe you could provide enough
information that the IndexedDB implementation can find the proper
class and function even after application restart.

However what I really think is the important question here is if the
people that expressed concern about tying to javascript in the past
are concerned about the more specific technical proposals that have
been made since then.

>> > What is the difference between an "expressionIndex" and a keyPath?  It
>> > seems
>> > like they're doing about the same thing.  My proposal to allow keyPath
>> > to be
>> > artibrary JavaScript and then have the value being inserted be the
>> > global
>> > scope actually sounds almost identical to what you're doing....except
>> > more
>> > in the relm of what JS engines already do (since it's much like an
>> > eval).
>>
>> It is about the same thing yes. Though presumably an implementation
>> could optimize "evaluating" the keyPath heavily. Much more so than a
>> generic expression.
>
> Howso?  Any implementation should be able to cache the compiled/JITed form
> of the string passed into it's internal implementation of eval.

Executing a complied expression, including firing up all the contexts
needed to run a piece of javascript, is at least in the firefox js
engine, significantly more expensive than simply grabbing the relevant
value as you serialize the javascript object into the database.

Additionally, for a simple keyPath you don't have to worry about
cloning objects or implementing copy-on-write semantics.

Finally, you can perform optimizations such as not updating indexes if
you notice that the relevant property doesn't change during calls to
objectStore.put() and cursor.update().

All in all I think keyPaths are preferrable when they are possible.
Full on expressions will likely always be slower and more cumbersome,
but they seem better than the alternatives when the value isn't stored
directly in the stored object.

>> Allowing keyPath to be an arbitrary expression turns out to result in
>> weird edge cases once you define all details. Say that you're storing
>> the value:
>>
>> { foo: 5, bar: 10 }
>>
>> You want to evaluate the expression in a context where "foo" and "bar"
>> are in scope. But you also don't want to evaluate with the value as
>> the global object as then "a = 5" would modify the value as that sets
>> a global variable. I.e. something like "t = foo*foo; bar + t;" would
>> add a 't' property to value. And you also don't want to evaluate the
>> expression in a context where the value is the current scope as then
>> "var a = 5" would modify the value.
>>
>> Additionally, you have to define what value is returned. We could do
>> like 'eval' is doing and say that the last evaluated expression is
>> what is returned. However eval doesn't let you return early using the
>> 'return' statement which might be practical.
>>
>> These problems could possibly be overcome though. I'll have to check
>> with people that know JS better than me. Ideas welcome.
>
> You're right that simply making it the global scope won't work unless we
> make a copy of it the global scope or do copy-on-write.  The former being
> heavy-weight at run time and the latter being a headache implementation
> wise.  That said, for 98% of the cases (where the expression is simple),
> having the key path just be an expression would be simpler.  One idea that
> just came to mind is what if we supported both (based on whether a string or
> function were passed in)...?

My proposal is definitely to support both. I.e. both have indexes that
are based on keyPaths, as well as indexes that are based on
expressions.

I don't think we can use type of the item being passed in though as I
think we need to pass expressions as strings to avoid pulling in the
functions context. I think accepting a function and automatically
dropping its context by calling .toString() on it would be pretty ugly
so I'm not a fan of that.

Instead I suspect we'll need two separate createIndex functions.

>> >> All in all I think this is a pretty cool solution. My main concern is
>> >> 3 listed above, but I'd like to measure to see if this is a really big
>> >> problem before tossing this idea out.
>> >>
>> >> Please let me know what you think.
>> >
>> > I only see superficial differences between this and my keyPath proposals
>> > that I made a while ago.  Am I missing something?
>>
>> Not at all, that's why I said "We have talked about two solutions so far"
>> :)
>>
>> We didn't really come to a conclusion in the previous thread. That's
>> why I wanted to collect all the information into this thread in an
>> attempt to make a decision.
>
> In that case, I feel like you should have put more effort into explaining
> previous ideas and comparing/contrasting your idea with them.  Especially
> since many of the potential problems you brought up here were ones you
> brought up in the previous thread, and many of the reasons why you're not
> worried about them also apply to the suggestion in the previous thread.

I'm not quite following your argument here. I attempted to enumerate a
list of problems that applied to the various other proposals that have
been brought to the table. I'm definitely not trying to take credit
for the idea of using expressions that produce index keys. I'll
happily concede the credit to others, including you.

> On Thu, Jun 24, 2010 at 12:11 AM, Jonas Sicking <jonas@sicking.cc> wrote:
>>
>> On Sat, Jun 19, 2010 at 1:05 AM, Jonas Sicking <jonas@sicking.cc> wrote:
>> >> Maybe we could make
>> >> the global scope be empty when compiling (so variables can't be bound)
>> >> and
>> >> executing the function?
>> >
>> > Yes, we absolutely should. Forgot to mention this.
>> >
>> >> What is the difference between an "expressionIndex" and a keyPath?  It
>> >> seems
>> >> like they're doing about the same thing.  My proposal to allow keyPath
>> >> to be
>> >> artibrary JavaScript and then have the value being inserted be the
>> >> global
>> >> scope actually sounds almost identical to what you're doing....except
>> >> more
>> >> in the relm of what JS engines already do (since it's much like an
>> >> eval).
>> >
>> > It is about the same thing yes. Though presumably an implementation
>> > could optimize "evaluating" the keyPath heavily. Much more so than a
>> > generic expression.
>> >
>> > Allowing keyPath to be an arbitrary expression turns out to result in
>> > weird edge cases once you define all details. Say that you're storing
>> > the value:
>> >
>> > { foo: 5, bar: 10 }
>> >
>> > You want to evaluate the expression in a context where "foo" and "bar"
>> > are in scope. But you also don't want to evaluate with the value as
>> > the global object as then "a = 5" would modify the value as that sets
>> > a global variable. I.e. something like "t = foo*foo; bar + t;" would
>> > add a 't' property to value. And you also don't want to evaluate the
>> > expression in a context where the value is the current scope as then
>> > "var a = 5" would modify the value.
>> >
>> > Additionally, you have to define what value is returned. We could do
>> > like 'eval' is doing and say that the last evaluated expression is
>> > what is returned. However eval doesn't let you return early using the
>> > 'return' statement which might be practical.
>> >
>> > These problems could possibly be overcome though. I'll have to check
>> > with people that know JS better than me. Ideas welcome.
>>
>> I talked to some javascript Gurus here and came up with this proposal:
>>
>> The string passed in is passed to the javascript Function constructor
>> in the following manner:
>>
>> f = new Function("value", passedInExpressionString);
>> Object.freeze(f);
>>
>> The resulting function is then executed for each value in the object
>> store, and the returned value is used as key in the index. If the
>> returned value is undefined, or if an exception is thrown, no entry is
>> added to the index. The function is executed in a global scope which
>> also is frozenand has no available properties other than the ones
>> defined by the ECMAScript standard. I.e. things like document,
>> XMLHttpRequest and setTimeout is not available.
>>
>> Here are some example expressions:
>> Index on the result of adding two properties together:
>> "return value.a + value.b;"
>>
>> Index on the number of days between two dates:
>> "return (value.to - value.from) / 1000*60*60*24;"
>>
>> Index on a persons last name:
>> "var names = value.name.split(/\\s*/); return names[names.length - 1];"
>>
>> The reason to freeze the global object and the function is to prevent
>> the expression from retaining state from one execution to the next. We
>> could also say that a fresh function object and a fresh global scope
>> object is created, however that would result in slower execution.
>
> It seems like this is something that could be optimized by implementations
> if it does become an issue though.  Right?

Sort of. It's certainly possible to write an implementation that uses
static or dynamic analysis to determine of the global object or the
function object has been modified and only create a fresh
global/function object when needed. However if we don't freeze the
global object it's quite likely that people will write expressions
like

"names = value.name.split(/\\s*/); return names[names.length - 1];"

which implicitly sets a variable on the global object. I.e. it makes
it very easy for authors to write expressions that fall off the fast
path without noticing. By freezing the global object that mistake is
avoided.

But you are right that freezing the function object is excessive. It
is extremely unlikely that someone will modify the function object by
accident, and so a implementation can safely fall off the fast path
when that is done.

> So what your proposing is that the keyPath would essentially be a string of
> the body of a function which runs for every index (on that objectStore) for
> every value inserted into that object store?

Sorry, I have been unclear. My proposal is to keep keyPaths as is,
restricted to the syntax:

name ('.' name)*

But add the ability to create indexes which do not use keyPaths, but
that use more powerful, but also slower and syntactically heavier,
expression-based indexes. These indexes only differ when they are
created, but otherwise have the exact same feature set as keyPath
based indexes.

> This seems like half way
> between the eval-like idea I mentioned earlier.  It certainly seems to have
> advantages for complex keyPaths, but I'm still not so hot on having
> boilerplate/assumptions (like needing "return" and assuming "value" is
> present) present in every single keyPath.  Especially when the use cases
> (while important) don't seem to be the common case.  (In fact, can you even
> do this in SQL?  If not, I think it's pretty strong evidence against needing
> to do arbitrary calculations in a keyPath.)

I agree that keyPaths, as seemingly defined by the spec right now,
cover the most common cases and should be retained.

I think the comparison to SQL isn't entirely an apples-to-apples
comparison. In SQL databases you often store data in separate tables
and in very normalized form. Take the candy store example in

http://docs.google.com/document/pub?id=1I__XnwvvSwyjvxi-FAAE0ecnUDhk5DF7L2GI6O31o18

Here data is stored much like you would in an SQL database, with
separate "tables" for kids and their candy sales. This results in a
somewhat painful join in some of the queries. A better way to use
IndexedDB would be to store the data as:

{ name: "Anna", id: 14, sales: [{candyId: 4, date: "2010-5-12"},
{candyId: 6, date: "2010-5-12"}, {candyId: 13, date: "2010-5-13"}] },
{ name: "Betty", id: 26, sales: [{candyId: 2, date: "2010-5-14"}] },
{ name: "Christine", id: 13 },

You could then create an expression-based indexes which digs out the
appropriate data to query sales by candy-type or whatever you'd
otherwise use the candy sales "table" for.

> I'm not a JavaScript (or JS engine) expert, but I find it difficult to
> believe that what you're proposing is much if any more optimizable than the
> (for lack of better name) eval-keyPath-proposal.

Hmm.. we might simply be talking past each other. Can you elaborate
with specifics as to how your eval-keyPath-proposal works?

> And as long as there are
> fundamental reasons why one can't be optimized, implementational challenges
> really shouldn't impact how we design the API.  At all.
> Anyway, I'll continue thinking about this, but I'm far from convinced that
> allowing javascript (whether it's like the body of a function or like it's
> evaled) is necessary for v1.  It adds a lot of complexity and I honestly
> doubt we understand the problems well enough at the moment to come up with
> the best solution.  I'd much rather punt for now.  But if you feel strongly
> we need this for v1, I think we need to find some examples in SQL or other
> database engines to base our implementation on.  Otherwise we're building an
> API on top of a tower of assumptions.

I'd prefer to leave expressions out of v1 as well. However I would be
worried about leaving out the ability to index on computed values
entirely. I.e. remove the ability to index on anything that isn't a
plain value in the stored object. And it doesn't seem like there is
much support for the way that the spec supports them now which is
through explicit calls to add/remove items out of an index.

Though maybe people can use separate objectStores which hold computed
values. That might certainly be good enough for version 1. I'll have
to think more about that.

/ Jonas
Received on Saturday, 26 June 2010 11:12:04 UTC