Re: [IndexedDB] Closing on bug 9903 (collations) from Aryeh Gregor on 2011-05-04 (public-webapps@w3.org from April to June 2011)

From: Aryeh Gregor <Simetrical+w3c@gmail.com>
Date: Wed, 4 May 2011 19:33:33 -0400
To: Jonas Sicking <jonas@sicking.cc>, Keean Schupke <keean@fry-it.com>
Cc: Pablo Castro <Pablo.Castro@microsoft.com>, "public-webapps@w3.org" <public-webapps@w3.org>
Message-ID: <BANLkTin64spjMHFWeSqawvHVjOkJuF1++Q@mail.gmail.com>
On Tue, May 3, 2011 at 7:57 PM, Jonas Sicking <jonas@sicking.cc> wrote:
> I don't think we should do callbacks for the first version of
> javascript. It gets very messy since we can't rely on that the script
> function will be returning stable values.

The worst that would happen if it didn't return stable values is that
sorting would return unpredictable results.

> So the choice here really is between only supporting some form of
> binary sorting, or supporting a built-in set of collations. Anything
> else will have to wait for version 2 in my opinion.

I think it would be a mistake to try supporting a limited set of
natural-language collations.  Binary collation is fine for a first
version.  MySQL only supported binary collation up through version 4,
for instance.

On Wed, May 4, 2011 at 3:49 AM, Keean Schupke <keean@fry-it.com> wrote:
> I thought only the app that created the db could open it (for security
> reasons)... so it becomes the app's responsibility to do version control.
> The comparison function is not going to change by itself - someone has to go
> into the code and change it, when they do that they should up the revision
> of the database, if that change is incompatible.

Why should we let such a pitfall exist if we can just store the
function and avoid the issue?

> There is exactly the same problem with object properties. If the app changes
> to expect a new property on all objects stored, then the app has to
> correctly deal with the update.

If a requested property doesn't exist, I assume the API will fail
immediately with a clear error code.  It will not fail silently and
mysteriously with no error code.  (Again, I haven't looked at it
closely, or tried to use it.)

> 2) making things easy for the user - for me a simpler more predictable API
> is better for the user. Having a function stored inside the database is bad,
> because you cannot see what function might be stored in there...

We could let you query the stored function.

> it might be
> a function from a previous version of the code and cause all sorts of
> strange bugs (which will only affect certain users with a certain version of
> the function stored in their DB).

It will cause *much* less strange bugs than if you have one index that
used two different collations, which is the alternative possibility.
If the function is stored, the worst case will be that the collation
function is out of date.  In practice, authors will mostly want to use
established collation functions like UCA and won't mind if they're out
of date.  They'll also only very rarely have occasion to deliberately
change the function.

On Wed, May 4, 2011 at 4:01 PM, Jonas Sicking <jonas@sicking.cc> wrote:
> Browsers can certainly deal with this, and ensure that the only one
> suffering is the author of the buggy algorithm. However this comes at
> a cost in that the browser sorting algorithm can't go into infinite
> loops or crash even in the face of the most ridiculous comparison
> algorithm. In other words, the browser will likely have to use a
> slower sorting implementation in order to be robust.

The browser will only run the function once every time the given field
changes, and change the value used in the index if it's different from
the current one.  The actual sorting will still be binary, just with a
user-provided key.  So there's no possibility of especially bad
effects if you're given a bad function.  You're only running it once
per value, so it's no worse than any other function that's run a bunch
of times.

We aren't talking about a sort()-style comparison function that
returns -1 or 0 or 1.  We're talking about a function that takes a
string as input, and outputs a string to be used in the index as the
key for the object in question.  I guess you *could* also do it as a
comparison function too -- would probably be easier to write, but also
a lot easier to get badly wrong, and you'd have to do a bunch of
function calls on insert or update instead of just one.

> Additionally, there is a significant cost involved in transitioning
> between the C++ code implementing the sorting algorithm, and the
> javascript implemented callback. That is on top of the cost of
> implementing the comparison function in javascript. Even in the best
> JITs, there is a significant overhead to both these parts.

It would only have to be run once per row (object?) modified.  Not run
at all for reads.  Would that really be so bad?  Also, most authors
would be content with built-in CLDR-based sort functions, which could
be C++.
Received on Wednesday, 4 May 2011 23:34:20 UTC