- From: Jonas Sicking <jonas@sicking.cc>
- Date: Fri, 18 Jan 2013 01:21:48 -0800
- To: Zoltan Kis <zoltan.kis@intel.com>
- Cc: Wayne Carr <wayne.carr@intel.com>, Christophe Dumez <christophe.dumez@intel.com>, Sakari Poussa <sakari.poussa@intel.com>, "public-sysapps@w3.org" <public-sysapps@w3.org>, EDUARDO FULLEA CARRERA <efc@tid.es>, JOSE MANUEL CANTERA FONSECA <jmcf@tid.es>, Tantek Çelik <tantek@mozilla.com>
On Jan 7, 2013 12:54 AM, "Kis, Zoltan" <zoltan.kis@intel.com> wrote: > On Fri, Jan 4, 2013 at 12:31 PM, Jonas Sicking <jonas@sicking.cc> wrote: I think there are two main issues that we are discussing in this thread: * What type of filtering to allow when querying the database. * How to build a synchronization mechanism to allow an application to store the data locally such that it can perform queries which aren't supported by the API. I think the second point we'll have an easier time to get to agreement on since I think we agree on the capability set there. So let's attack the first point separately. > > So for example I don't think that moving all filtering functionality > > into the API with the argument "it can be done faster in the > > implementation because the implementation can be written in a faster > > language" is always a good argument. That argument leads to enormous > > APIs and often less flexible ones. > > Specifying filters in the API makes possible an end-to-end data query > optimization, since you can carry everything "what you want" down to > the DB level and slice and translate it there to the specifics of your > DB implementation. If I understood right, in your case you'd bring up > the DB specifics to the app level and make optimizations in the app, > that's why you don't see advantage in it. > > BTW I don't think the filtering API as specified in Messaging and > Contacts is complex. We have implemented them a while back and they > match nicely to SQL, SPARQL and proprietary DB searches. > > Indeed. While implementations can add indexes to support some > > filtering in the API, it quickly gets to the point when too many > > indexes are needed to ensure that the API is quick in all cases. > > This issue is solved in most database implementations. This is not the case with any of the databases that I have worked with. But I admit I'm not a database expert. Please note that simply translating the filter call into SQL gives no guarantees that the query will execute with acceptable performance. All SQL engines that I've ever worked with will happily fall back to doing a full table scan if it can't find the proper indexes to use. So let's get into specifics. How would you write an implementation for, for example, a contacts API which stores 5 different properties and supports a find function which allows filtering on a single arbitrary property. I.e. no composited filters allowed, only filters which say that property X should have value Y? Which indexes would you create and what types of indexes? How about if the API supports a find function which supports composited filters on two arbitrary properties. I.e. filters that say that property X should have value Y and property A should have value B. Again, which indexes of what type would you create? Keep in mind that we have agreed that performance of the find function should be roughly proportional to the number of entries returned. Then more detail about the implementation algorithms you'd use, the better. I can't think of a way to do this with less than 25 b-tree indexes. 30 if we require that the result is sorted by first name. Twice that if we allow sorting by first or last name. I feel like we are either talking past each other, or there is some type of optimization technique that I don't know of. I actually feel like we are quite far away from each other in our proposals on this topic. Probably at least in part because I don't understand how to implement your proposal with the performance constraints that I would like to have. My proposed approach definitely puts a lot more responsibility on the application, but I think this is a necessity due to limitations in what we *can* implement with reasonable performance. And I also think that the amount of use cases you end up being able to support with more complex filtering mechanisms is actually relatively small. Which is why we'll need the sync mechanism. / Jonas
Received on Friday, 18 January 2013 09:22:45 UTC