- From: Jonas Sicking <jonas@sicking.cc>
- Date: Fri, 6 May 2011 05:09:40 -0700
- To: Keean Schupke <keean@fry-it.com>
- Cc: Aryeh Gregor <Simetrical+w3c@gmail.com>, Pablo Castro <Pablo.Castro@microsoft.com>, "public-webapps@w3.org" <public-webapps@w3.org>
On Fri, May 6, 2011 at 4:09 AM, Keean Schupke <keean@fry-it.com> wrote: > On 6 May 2011 10:18, Jonas Sicking <jonas@sicking.cc> wrote: >> >> On Thu, May 5, 2011 at 11:36 PM, Keean Schupke <keean@fry-it.com> wrote: >> > On 6 May 2011 03:00, Jonas Sicking <jonas@sicking.cc> wrote: >> >> >> >> On Wed, May 4, 2011 at 11:12 PM, Keean Schupke <keean@fry-it.com> >> >> wrote: >> >> > On 5 May 2011 00:33, Aryeh Gregor <Simetrical+w3c@gmail.com> wrote: >> >> >> >> >> >> On Tue, May 3, 2011 at 7:57 PM, Jonas Sicking <jonas@sicking.cc> >> >> >> wrote: >> >> >> > I don't think we should do callbacks for the first version of >> >> >> > javascript. It gets very messy since we can't rely on that the >> >> >> > script >> >> >> > function will be returning stable values. >> >> >> >> >> >> The worst that would happen if it didn't return stable values is >> >> >> that >> >> >> sorting would return unpredictable results. >> >> > >> >> > Worst is an infinite loop - no return. >> >> > >> >> >> >> >> >> > So the choice here really is between only supporting some form of >> >> >> > binary sorting, or supporting a built-in set of collations. >> >> >> > Anything >> >> >> > else will have to wait for version 2 in my opinion. >> >> >> >> >> >> I think it would be a mistake to try supporting a limited set of >> >> >> natural-language collations. Binary collation is fine for a first >> >> >> version. MySQL only supported binary collation up through version >> >> >> 4, >> >> >> for instance. >> >> > >> >> > A good point about MySQL. >> >> > >> >> >> >> >> >> On Wed, May 4, 2011 at 3:49 AM, Keean Schupke <keean@fry-it.com> >> >> >> wrote: >> >> >> > I thought only the app that created the db could open it (for >> >> >> > security >> >> >> > reasons)... so it becomes the app's responsibility to do version >> >> >> > control. >> >> >> > The comparison function is not going to change by itself - someone >> >> >> > has >> >> >> > to go >> >> >> > into the code and change it, when they do that they should up the >> >> >> > revision >> >> >> > of the database, if that change is incompatible. >> >> >> >> >> >> Why should we let such a pitfall exist if we can just store the >> >> >> function and avoid the issue? >> >> > >> >> > I don't see it as a pitfall, it is an has the advantage of >> >> > transparency. >> >> > >> >> >> >> >> >> > There is exactly the same problem with object properties. If the >> >> >> > app >> >> >> > changes >> >> >> > to expect a new property on all objects stored, then the app has >> >> >> > to >> >> >> > correctly deal with the update. >> >> >> >> >> >> If a requested property doesn't exist, I assume the API will fail >> >> >> immediately with a clear error code. It will not fail silently and >> >> >> mysteriously with no error code. (Again, I haven't looked at it >> >> >> closely, or tried to use it.) >> >> > >> >> > What if the new version uses the same property name for a different >> >> > thing? >> >> > For example in V1 'Employer' is a string name, and in V2 'Employer' >> >> > is a >> >> > reference to another object. You may say 'you should change the >> >> > column >> >> > name'? Right thats just the same as me saying you should change the >> >> > DB >> >> > version number when you change the collation algorithm. Its the same >> >> > thing. >> >> > People seem to be making a big fuss about having a non-persisted >> >> > collation >> >> > function defined in user code, when many many things require the code >> >> > to >> >> > have the correct model of the data stored in the database to work >> >> > properly. >> >> > It seems illogical to make a special case for this function, and not >> >> > do >> >> > anything about all the other cases. IMHO either the database should >> >> > have >> >> > a >> >> > stored schema, or it should not. If IndexedDB is going the direction >> >> > of >> >> > not >> >> > having a stored schema, then the designers should have the confidence >> >> > in >> >> > their decision to stick with it and at least produce something with a >> >> > consistent approach to the problem. >> >> > >> >> >> >> >> >> > 2) making things easy for the user - for me a simpler more >> >> >> > predictable >> >> >> > API >> >> >> > is better for the user. Having a function stored inside the >> >> >> > database >> >> >> > is >> >> >> > bad, >> >> >> > because you cannot see what function might be stored in there... >> >> >> >> >> >> We could let you query the stored function. >> >> > >> >> > Why would you need to read it. Every time you open the database you >> >> > would >> >> > need to check the function is the one you expect. The code would have >> >> > to >> >> > contain the function so it can compare it with the one in the DB and >> >> > update >> >> > it if necessary. If the code contains the function there are two >> >> > copies >> >> > of >> >> > the function, one in the database and one in the code? which one is >> >> > correct? >> >> > which one is it using? So sometimes you will write the new function >> >> > to >> >> > the >> >> > database, and sometimes you will not? More paths to test in code >> >> > coverage, >> >> > more complexity. Its simpler to just always set the function when >> >> > opening >> >> > the database. >> >> > >> >> >> >> >> >> > it might be >> >> >> > a function from a previous version of the code and cause all sorts >> >> >> > of >> >> >> > strange bugs (which will only affect certain users with a certain >> >> >> > version of >> >> >> > the function stored in their DB). >> >> >> >> >> >> It will cause *much* less strange bugs than if you have one index >> >> >> that >> >> >> used two different collations, which is the alternative possibility. >> >> >> If the function is stored, the worst case will be that the collation >> >> >> function is out of date. In practice, authors will mostly want to >> >> >> use >> >> >> established collation functions like UCA and won't mind if they're >> >> >> out >> >> >> of date. They'll also only very rarely have occasion to >> >> >> deliberately >> >> >> change the function. >> >> > >> >> > As I said, you will end up querying the function to see if it is the >> >> > one >> >> > you >> >> > want to use, if you do that you may as well set it every time. >> >> > Thinking about this a bit more. If you change the collation function >> >> > you >> >> > need to re-sort the index to make sure it will work (and avoid those >> >> > strange >> >> > bugs). Storing the function in the DB enables you to compare the >> >> > function >> >> > and only change it when you need to, thus optimising the number of >> >> > re-sorts. >> >> > That is the _only_ advantage to storing the function - as you still >> >> > need >> >> > to >> >> > check the function stored is the one you expect to guarantee your >> >> > code >> >> > will >> >> > run properly. So with a non-persisted function we need to sort every >> >> > time we >> >> > open to make sure the order is correct. However, if we attach a >> >> > version >> >> > number to the index, we can check the version number in out code to >> >> > know >> >> > if >> >> > we need to resort the index. The simplest API for this would be: >> >> > index.setCollation(1.1, my_collation_function); >> >> > So the version number is checked against the index. If it is the >> >> > same, >> >> > the >> >> > supplied collation function is used without re-sorting the index. If >> >> > it >> >> > is >> >> > different the index order is checked/re-sorted. All you have to do is >> >> > remember to up the version number. Local testing before rolling out >> >> > the >> >> > changes should catch failure to do so. >> >> >> >> We have already decided that we don't want to take on the complexity >> >> that comes with supporting changing collations on existing data. In >> >> particular it becomes very unclear what to do with data that is no >> >> longer unique under the new collation. >> >> >> >> >> On Wed, May 4, 2011 at 4:01 PM, Jonas Sicking <jonas@sicking.cc> >> >> >> wrote: >> >> >> > Browsers can certainly deal with this, and ensure that the only >> >> >> > one >> >> >> > suffering is the author of the buggy algorithm. However this comes >> >> >> > at >> >> >> > a cost in that the browser sorting algorithm can't go into >> >> >> > infinite >> >> >> > loops or crash even in the face of the most ridiculous comparison >> >> >> > algorithm. In other words, the browser will likely have to use a >> >> >> > slower sorting implementation in order to be robust. >> >> >> >> >> >> The browser will only run the function once every time the given >> >> >> field >> >> >> changes, and change the value used in the index if it's different >> >> >> from >> >> >> the current one. The actual sorting will still be binary, just with >> >> >> a >> >> >> user-provided key. So there's no possibility of especially bad >> >> >> effects if you're given a bad function. You're only running it once >> >> >> per value, so it's no worse than any other function that's run a >> >> >> bunch >> >> >> of times. >> >> >> >> >> >> We aren't talking about a sort()-style comparison function that >> >> >> returns -1 or 0 or 1. We're talking about a function that takes a >> >> >> string as input, and outputs a string to be used in the index as the >> >> >> key for the object in question. I guess you *could* also do it as a >> >> >> comparison function too -- would probably be easier to write, but >> >> >> also >> >> >> a lot easier to get badly wrong, and you'd have to do a bunch of >> >> >> function calls on insert or update instead of just one. >> >> > >> >> > A comparison function would be a lot simpler for the user to write. >> >> >> >> And a lot slower. For inserting N records in the database it'll take >> >> in the order of N * log2(N) calls to the comparison function. For each >> >> call you have to pay the penalty of crossing between languages as well >> >> as rechecking all your state once you get back. You additionally have >> >> to rely on users supplying the same collation function as well as >> >> specifically signal to the API whenever >> >> >> >> I think ultimately we simply seem to disagree here. I think that >> >> supporting a standard set of collations is going to solve more than >> >> 80% of the use cases (which is a good rule of thumb for these things) >> >> for version 1 as well as is easier on users and so something we'll >> >> ultimately will want to add anyway. Thus adding it now won't be >> >> painting us in a corner and it solves the majority of use cases. >> >> >> >> If I understand you correctly you don't think that it solves the >> >> majority of use cases and you think that it adds API which is bad and >> >> that we should never add. >> >> >> >> Is this a correct assessment? >> >> >> >> / Jonas >> > >> > >> > I think it solves the majority of the use cases, but only if all >> > browsers >> > implement the same useful set of collations, and updates to that set are >> > managed in a predictable / useful way across browsers in the future. >> > This >> > still leaves the problem that some programs may not behave as intended >> > by >> > the author (after updating the collations) or the collations will not be >> > up >> > to date (with the latest CLDR). However in general I would be happy for >> > something like the standard unicode sorting algorithm to be be >> > pre-installed >> > for the user. >> > The second point is the API, I don't think its a bad API, but I do think >> > its >> > inelegant. Passing the index a sort-order mapping function (which was my >> > original suggestion) or a comparison function (which I think may be >> > easier >> > for the average programmer to write), where this comparison function may >> > be >> > user supplied or provided by the browser. If you want to optimise for >> > speed, >> > observe that every function is unique for example: >> > function a() {}; >> > function b() {}; >> > // a !== b >> > So for the built in functions there only needs to be a pre-defined >> > unique >> > function object, and that unique ID can be used in the C++ code to >> > directly >> > use a C++ implementation of sort. So if you use the standard function >> > there >> > would be no call overhead - you only get the overhead if you use a user >> > defined function. IMHO two different APIs is not a big problem, but why >> > have >> > two if one can do it all elegantly. >> > So in summery, there are important concerns about managing updates to >> > the >> > collations going forward, then there are my personal feelings about what >> > makes a good API. I am prepared to justify my opinion on API design, >> > and >> > think its best to raise these issues, but I understand that other people >> > may >> > not share these views. >> >> Yes, we could use a comparison function and supply a set of such >> comparison functions from the browser and make sure to optimize the >> case where the comparison function is one that is supplied by the >> implementation. >> >> That takes care of the performance problem, but only if you stick to >> the feature set of the API that I'm proposing. >> >> That still leaves the fact that the collation function has to be >> provided every time the database is opened, which does not at all fit >> with how the rest of the API works. And yes, I'm aware that you don't >> like the way that the API works, but I think doing something inbetween >> is the worst possible solution. > > So there is now a schema that has to be managed. To be able to work with my > relational library's declarative style, it must be possible to query all the > stored properties. Whilst not as clean IMHO as having it stateless, this > would still let me do what I need in the library layer. Of course. As you've surely noticed that is already the case with the APIs that are in the spec already. >> And it also doesn't handle the problem of what to do if someone does >> provide a different collation function between open calls to the >> database, which might have different ideas of which values compare >> equal. Even if the author only makes use of the built-in collation >> functions we still are left with the problem of what happens if that >> function changes between open calls? Or worse, what happens if two >> pages open the same database, but uses different collations in the two >> pages? > > I would recommend using modular programming and providing a javascript file > to include which does the open for each page. Sure, we can recommend that for webdevelopers. However the questions I posed was about what IndexedDB implementations should do? I.e. what behavior in these scenarios would we define in the spec? >> So all in all, compared to what Pablo is proposing, your proposal only >> adds support for a rare set of use cases, while not fitting with the >> rest of the API, and introduces a whole set of edge cases that we need >> to spend time defining and handling in the implementation. > > My proposal allows libraries to make sure that collations offered by > different backends are the same. If WebSQL offers a collation order (from > SQLite) that IndexedDB does not support, the library can provide an > implementation. > >> >> Based on that, my conclusion is that we should go with what Pablo is >> proposing. And I think we should do it for v1. >> >> / Jonas > > I agree with that for v1, > with this requirement: all the stored properties are readable. Of course. I don't believe anyone has suggested anything else. > and with this request: the ability to supply a collation function (that is > not persisted) for library authors writing libraries like relationalDB over > the top of IndexedDB. On that basis, all the arguments about users writing > pages are moot, as I think we need to support library authors too. This sounds like a feature we should look at for v2 for sure. / Jonas
Received on Friday, 6 May 2011 12:10:38 UTC