Re: [IndexedDB] Computed indexes from Jeremy Orlow on 2010-06-26 (public-webapps@w3.org from April to June 2010)

From: Jeremy Orlow <jorlow@chromium.org>
Date: Sat, 26 Jun 2010 19:24:29 +0100
To: Jonas Sicking <jonas@sicking.cc>
Cc: Webapps WG <public-webapps@w3.org>
Message-ID: <AANLkTim8WGuIpWptMEYF_kM6_7T2M38EJlsEIq1kxJjV@mail.gmail.com>
On Sat, Jun 26, 2010 at 12:11 PM, Jonas Sicking <jonas@sicking.cc> wrote:

> On Thu, Jun 24, 2010 at 7:01 AM, Jeremy Orlow <jorlow@chromium.org> wrote:
> >> >> 4. Maciej expressed concern that this might make it impossible to
> >> >> expose IndexedDB to non-JS languages such as ObjectiveC
> >> >>
> >> >> Let me address these in order (for the purposes of this discussion
> >> >> I'll use a separate function to create one of these indexes called
> >> >> createExpressionIndex. I'd prefer to do the bike-shedding afterward
> if
> >> >> possible):
> >> >> While I think 1 is unfortunate, i don't think it's a big deal.
> >> >> Fortunately Javascript has the nice feature that it allows functions
> >> >> to be serialized into strings, which means that while you can't do
> >> >> myObjectStore.createExpressionIndex("myIndex", function(val) { return
> >> >> ....
> >> >> });
> >> >> you can do
> >> >> myObjectStore.createExpressionIndex("myIndex", (function(val) {
> return
> >> >> .... }).toString());
> >> >
> >> > Do most JS engines allow _any_ function to be stored away for later?
> >>
> >> If you mean if _any_ function can be serialized using toString(), then
> >> yes, I believe so. Though of course serializing loses all scope
> >> references so it's not guaranteed to work. And non-builtin functions.
> >> Functions like Array.sort or Document.getElementById can of course not
> >> be serialized.
> >
> > This may be true of JavaScript and even many other interpreted languages,
> > but it's not true of many (most?) compiled languages.  How would we
> handle
> > this in the spec?  Would we say it's unsupported in v1 or say that such
> > implementations would define their own language (or use an existing
> > interpreted language) as they see fit?  I guess this isn't a deal breaker
> > since JavaScript really is the main target for this, but it's somewhat
> > unfortunate.
>
> Like I said in an earlier email in this thread, we can fairly easily
> solve this using WebIDL. All that we really need here is a callback
> which can be called after browser restarts. Javascript doesn't provide
> any such means, which is why we have to resort to passing in a string.
> However in for example Java I believe you could provide enough
> information that the IndexedDB implementation can find the proper
> class and function even after application restart.
>

How would you do that in C/C++ though?  As far as I know, Java is one of the
few languages where this is actually true.  And even then it seems a bit
clunky.  In C/C++, passing in a function pointer would not work once you
re-compile the program.


> However what I really think is the important question here is if the
> people that expressed concern about tying to javascript in the past
> are concerned about the more specific technical proposals that have
> been made since then.
>

I'm not sure if people who could make such an objection are watching these
threads very carefully (if at all), so I think we'll need to be more
proactive if we really want to get their advice.  (i.e. at the very least,
start a new, specific thread on this topic with an obvious subject.)


> >> > What is the difference between an "expressionIndex" and a keyPath?  It
> >> > seems
> >> > like they're doing about the same thing.  My proposal to allow keyPath
> >> > to be
> >> > artibrary JavaScript and then have the value being inserted be the
> >> > global
> >> > scope actually sounds almost identical to what you're doing....except
> >> > more
> >> > in the relm of what JS engines already do (since it's much like an
> >> > eval).
> >>
> >> It is about the same thing yes. Though presumably an implementation
> >> could optimize "evaluating" the keyPath heavily. Much more so than a
> >> generic expression.
> >
> > Howso?  Any implementation should be able to cache the compiled/JITed
> form
> > of the string passed into it's internal implementation of eval.
>
> Executing a complied expression, including firing up all the contexts
> needed to run a piece of javascript, is at least in the firefox js
> engine, significantly more expensive than simply grabbing the relevant
> value as you serialize the javascript object into the database.
>
> Additionally, for a simple keyPath you don't have to worry about
> cloning objects or implementing copy-on-write semantics.
>
> Finally, you can perform optimizations such as not updating indexes if
> you notice that the relevant property doesn't change during calls to
> objectStore.put() and cursor.update().
>
> All in all I think keyPaths are preferrable when they are possible.
> Full on expressions will likely always be slower and more cumbersome,
> but they seem better than the alternatives when the value isn't stored
> directly in the stored object.
>

Ahh!  I think I see why we're both having so hard of a time understanding
each other.  I've been talking about various proposals for what keyPath
would be and you're talking about a proposal that'd be in addition to some
"simple" keyPath proposal (that could easily have the properties you just
mentioned above).  That explains a lot.

I definitely like the idea of some simple keyPath concept optionally
augmented by a more complex one that can actually calculate the key--whether
represented as a function body or an expression.


> >> Allowing keyPath to be an arbitrary expression turns out to result in
> >> weird edge cases once you define all details. Say that you're storing
> >> the value:
> >>
> >> { foo: 5, bar: 10 }
> >>
> >> You want to evaluate the expression in a context where "foo" and "bar"
> >> are in scope. But you also don't want to evaluate with the value as
> >> the global object as then "a = 5" would modify the value as that sets
> >> a global variable. I.e. something like "t = foo*foo; bar + t;" would
> >> add a 't' property to value. And you also don't want to evaluate the
> >> expression in a context where the value is the current scope as then
> >> "var a = 5" would modify the value.
> >>
> >> Additionally, you have to define what value is returned. We could do
> >> like 'eval' is doing and say that the last evaluated expression is
> >> what is returned. However eval doesn't let you return early using the
> >> 'return' statement which might be practical.
> >>
> >> These problems could possibly be overcome though. I'll have to check
> >> with people that know JS better than me. Ideas welcome.
> >
> > You're right that simply making it the global scope won't work unless we
> > make a copy of it the global scope or do copy-on-write.  The former being
> > heavy-weight at run time and the latter being a headache implementation
> > wise.  That said, for 98% of the cases (where the expression is simple),
> > having the key path just be an expression would be simpler.  One idea
> that
> > just came to mind is what if we supported both (based on whether a string
> or
> > function were passed in)...?
>
> My proposal is definitely to support both. I.e. both have indexes that
> are based on keyPaths, as well as indexes that are based on
> expressions.
>
> I don't think we can use type of the item being passed in though as I
> think we need to pass expressions as strings to avoid pulling in the
> functions context. I think accepting a function and automatically
> dropping its context by calling .toString() on it would be pretty ugly
> so I'm not a fan of that.
>
> Instead I suspect we'll need two separate createIndex functions.
>

Agreed.


>  >> >> All in all I think this is a pretty cool solution. My main concern
> is
> >> >> 3 listed above, but I'd like to measure to see if this is a really
> big
> >> >> problem before tossing this idea out.
> >> >>
> >> >> Please let me know what you think.
> >> >
> >> > I only see superficial differences between this and my keyPath
> proposals
> >> > that I made a while ago.  Am I missing something?
> >>
> >> Not at all, that's why I said "We have talked about two solutions so
> far"
> >> :)
> >>
> >> We didn't really come to a conclusion in the previous thread. That's
> >> why I wanted to collect all the information into this thread in an
> >> attempt to make a decision.
> >
> > In that case, I feel like you should have put more effort into explaining
> > previous ideas and comparing/contrasting your idea with them.  Especially
> > since many of the potential problems you brought up here were ones you
> > brought up in the previous thread, and many of the reasons why you're not
> > worried about them also apply to the suggestion in the previous thread.
>
> I'm not quite following your argument here. I attempted to enumerate a
> list of problems that applied to the various other proposals that have
> been brought to the table. I'm definitely not trying to take credit
> for the idea of using expressions that produce index keys. I'll
> happily concede the credit to others, including you.
>

I'm not concerned in the least with "credit".  I just think that when
something has been discussed before, it's important to use those discussions
as a starting point for new ones.  Otherwise we end up re-hashing the same
discussions.  Even just linking to original discussions can be helpful.
 Anyway, it doesn't really matter at this point...I think we're on the same
page now.  :-)


> > On Thu, Jun 24, 2010 at 12:11 AM, Jonas Sicking <jonas@sicking.cc
> > wrote:
> >>
> >> On Sat, Jun 19, 2010 at 1:05 AM, Jonas Sicking <jonas@sicking.cc>
> wrote:
> >> >> Maybe we could make
> >> >> the global scope be empty when compiling (so variables can't be
> bound)
> >> >> and
> >> >> executing the function?
> >> >
> >> > Yes, we absolutely should. Forgot to mention this.
> >> >
> >> >> What is the difference between an "expressionIndex" and a keyPath?
>  It
> >> >> seems
> >> >> like they're doing about the same thing.  My proposal to allow
> keyPath
> >> >> to be
> >> >> artibrary JavaScript and then have the value being inserted be the
> >> >> global
> >> >> scope actually sounds almost identical to what you're doing....except
> >> >> more
> >> >> in the relm of what JS engines already do (since it's much like an
> >> >> eval).
> >> >
> >> > It is about the same thing yes. Though presumably an implementation
> >> > could optimize "evaluating" the keyPath heavily. Much more so than a
> >> > generic expression.
> >> >
> >> > Allowing keyPath to be an arbitrary expression turns out to result in
> >> > weird edge cases once you define all details. Say that you're storing
> >> > the value:
> >> >
> >> > { foo: 5, bar: 10 }
> >> >
> >> > You want to evaluate the expression in a context where "foo" and "bar"
> >> > are in scope. But you also don't want to evaluate with the value as
> >> > the global object as then "a = 5" would modify the value as that sets
> >> > a global variable. I.e. something like "t = foo*foo; bar + t;" would
> >> > add a 't' property to value. And you also don't want to evaluate the
> >> > expression in a context where the value is the current scope as then
> >> > "var a = 5" would modify the value.
> >> >
> >> > Additionally, you have to define what value is returned. We could do
> >> > like 'eval' is doing and say that the last evaluated expression is
> >> > what is returned. However eval doesn't let you return early using the
> >> > 'return' statement which might be practical.
> >> >
> >> > These problems could possibly be overcome though. I'll have to check
> >> > with people that know JS better than me. Ideas welcome.
> >>
> >> I talked to some javascript Gurus here and came up with this proposal:
> >>
> >> The string passed in is passed to the javascript Function constructor
> >> in the following manner:
> >>
> >> f = new Function("value", passedInExpressionString);
> >> Object.freeze(f);
> >>
> >> The resulting function is then executed for each value in the object
> >> store, and the returned value is used as key in the index. If the
> >> returned value is undefined, or if an exception is thrown, no entry is
> >> added to the index. The function is executed in a global scope which
> >> also is frozenand has no available properties other than the ones
> >> defined by the ECMAScript standard. I.e. things like document,
> >> XMLHttpRequest and setTimeout is not available.
> >>
> >> Here are some example expressions:
> >> Index on the result of adding two properties together:
> >> "return value.a + value.b;"
> >>
> >> Index on the number of days between two dates:
> >> "return (value.to - value.from) / 1000*60*60*24;"
> >>
> >> Index on a persons last name:
> >> "var names = value.name.split(/\\s*/); return names[names.length - 1];"
> >>
> >> The reason to freeze the global object and the function is to prevent
> >> the expression from retaining state from one execution to the next. We
> >> could also say that a fresh function object and a fresh global scope
> >> object is created, however that would result in slower execution.
> >
> > It seems like this is something that could be optimized by
> implementations
> > if it does become an issue though.  Right?
>
> Sort of. It's certainly possible to write an implementation that uses
> static or dynamic analysis to determine of the global object or the
> function object has been modified and only create a fresh
> global/function object when needed. However if we don't freeze the
> global object it's quite likely that people will write expressions
> like
>
> "names = value.name.split(/\\s*/); return names[names.length - 1];"
>
> which implicitly sets a variable on the global object. I.e. it makes
> it very easy for authors to write expressions that fall off the fast
> path without noticing. By freezing the global object that mistake is
> avoided.
>

Good point; it would be pretty easy for someone to modify the global object
without realizing the performance impact.


> But you are right that freezing the function object is excessive. It
> is extremely unlikely that someone will modify the function object by
> accident, and so a implementation can safely fall off the fast path
> when that is done.
>
> > So what your proposing is that the keyPath would essentially be a string
> of
> > the body of a function which runs for every index (on that objectStore)
> for
> > every value inserted into that object store?
>
> Sorry, I have been unclear. My proposal is to keep keyPaths as is,
> restricted to the syntax:
>
> name ('.' name)*
>
> But add the ability to create indexes which do not use keyPaths, but
> that use more powerful, but also slower and syntactically heavier,
> expression-based indexes. These indexes only differ when they are
> created, but otherwise have the exact same feature set as keyPath
> based indexes.
>

Got it.  (Guess I should have read your whole response before responding.
 :-)


>  > This seems like half way
> > between the eval-like idea I mentioned earlier.  It certainly seems to
> have
> > advantages for complex keyPaths, but I'm still not so hot on having
> > boilerplate/assumptions (like needing "return" and assuming "value" is
> > present) present in every single keyPath.  Especially when the use cases
> > (while important) don't seem to be the common case.  (In fact, can you
> even
> > do this in SQL?  If not, I think it's pretty strong evidence against
> needing
> > to do arbitrary calculations in a keyPath.)
>
> I agree that keyPaths, as seemingly defined by the spec right now,
> cover the most common cases and should be retained.
>
> I think the comparison to SQL isn't entirely an apples-to-apples
> comparison. In SQL databases you often store data in separate tables
> and in very normalized form. Take the candy store example in
>
>
> http://docs.google.com/document/pub?id=1I__XnwvvSwyjvxi-FAAE0ecnUDhk5DF7L2GI6O31o18
>
> Here data is stored much like you would in an SQL database, with
> separate "tables" for kids and their candy sales. This results in a
> somewhat painful join in some of the queries. A better way to use
> IndexedDB would be to store the data as:
>
> { name: "Anna", id: 14, sales: [{candyId: 4, date: "2010-5-12"},
> {candyId: 6, date: "2010-5-12"}, {candyId: 13, date: "2010-5-13"}] },
> { name: "Betty", id: 26, sales: [{candyId: 2, date: "2010-5-14"}] },
> { name: "Christine", id: 13 },
>
> You could then create an expression-based indexes which digs out the
> appropriate data to query sales by candy-type or whatever you'd
> otherwise use the candy sales "table" for.
>
> > I'm not a JavaScript (or JS engine) expert, but I find it difficult to
> > believe that what you're proposing is much if any more optimizable than
> the
> > (for lack of better name) eval-keyPath-proposal.
>
> Hmm.. we might simply be talking past each other. Can you elaborate
> with specifics as to how your eval-keyPath-proposal works?
>

It's the same as what you're proposing, I think, except that instead of
passing in a function body you'd pass in an expression.  But if we're adding
this _in addition to_ the existing keyPath concept then I actually think a
function body is a better way to go since it's more powerful and cleaner
than an expression.


> > And as long as there are
> > fundamental reasons why one can't be optimized, implementational
> challenges
> > really shouldn't impact how we design the API.  At all.
> > Anyway, I'll continue thinking about this, but I'm far from convinced
> that
> > allowing javascript (whether it's like the body of a function or like
> it's
> > evaled) is necessary for v1.  It adds a lot of complexity and I honestly
> > doubt we understand the problems well enough at the moment to come up
> with
> > the best solution.  I'd much rather punt for now.  But if you feel
> strongly
> > we need this for v1, I think we need to find some examples in SQL or
> other
> > database engines to base our implementation on.  Otherwise we're building
> an
> > API on top of a tower of assumptions.
>
> I'd prefer to leave expressions out of v1 as well. However I would be
> worried about leaving out the ability to index on computed values
> entirely. I.e. remove the ability to index on anything that isn't a
> plain value in the stored object. And it doesn't seem like there is
> much support for the way that the spec supports them now which is
> through explicit calls to add/remove items out of an index.
>

I definitely think we need to remove the concept of managing an index
manually.  Whether it's worth putting this in v1 or adding some way to
provide index values when doing an objectStore.put/add/etc, I'm honestly not
sure.

To be honest, I don't think any of the use cases presented so far are a slam
dunk for the need to support computed indexes (vs. just making someone store
the computed value in the object store value itself and then indexing with a
simple keyPath), but I definitely do think it'll be a useful feature.  And
it sounds like you guys are pretty convinced of its importance, so I'm happy
to go along with it.


> Though maybe people can use separate objectStores which hold computed
> values. That might certainly be good enough for version 1. I'll have
> to think more about that.
>

Hmmm....I'm not sure I follow.  Could you explain more thoroughly?

J
Received on Saturday, 26 June 2010 18:25:19 UTC