Data models and filtering (was: Re: Contacts API draft) from Kis, Zoltan on 2012-11-10 (public-sysapps@w3.org from November 2012)

From: Kis, Zoltan <zoltan.kis@intel.com>
Date: Sat, 10 Nov 2012 16:34:26 +0200
To: Jonas Sicking <jonas@sicking.cc>
Cc: "Dumez, Christophe" <christophe.dumez@intel.com>, Tantek Çelik <tantek@mozilla.com>, EDUARDO FULLEA CARRERA <efc@tid.es>, "public-sysapps@w3.org" <public-sysapps@w3.org>, Wayne Carr <wayne.carr@intel.com>, Sakari Poussa <sakari.poussa@intel.com>, JOSE MANUEL CANTERA FONSECA <jmcf@tid.es>
Message-ID: <CANrNqUdJ9kCCgox9agqpUTorTDOZk+AmQ7aZ0gHOD-6Qkf6TyA@mail.gmail.com>
Hello,

I think this topic is worth a separate thread. The problem is not only
characteristic to Contacts: I am also facing the same issues with Call
History and Messaging.

Currently I am using the same filters as Chris in these API's.
However, Jonas raised a lot of good points. I propose we have a
common-to-all-API's discussion/solution for this topic. Of course the
Contacts thread can continue on its own, I hijacked the original
thread (scroll down for it) only for this particular issue. :).

A few key points from Jonas were:
- performance is key issue, and JS-only implementations are valid use cases
- there is doubt that a general purpose query API is performant enough
(in general) for the above use case, in order to build data models for
apps at a rate required by the app
- therefore there must be a method to cache or sync data (or a full
database) at JS side in order to control performance-critical
operations from there
- there was a proposal for this sync is done by polling delta's from
data sources.

I agree with these, with the following notes/additions:
- some data, e.g. anything to do with close-to-real-time constraints
must be managed from the native side - or the same side where the
protocols are handled -, (e.g. call history, otherwise we risk losing
incoming calls), and only _exposed_ to JS side.
One could say that in these cases the middleware could maintain a
local sync cache, which is erased after syncing to JS side, so this
can be solved. However, there may be multiple clients for the data,
which can be native or JS or both, so the requirement is that if an
implementation choses to replicate/sync data to the JS side, must keep
it in 2-way sync with the origin of the data (i.e. changes done on JS
side are synced back to native side).

- the performance bottleneck you mentioned is usually not in the
native database itself (key-value store, SQL DB, semantic DB,
distributed heterogenous DB, etc). In most cases, the bottleneck is in
the way query results or data models are exposed to the JS side. A DB
implementation in JS will certainly not be faster per se than a native
implementation of similar design, but having it on JS side may be more
efficient because the eventual IPC bottleneck (which is not mandatory
to have, either). So I claim this performance issue is an
implementation issue, and not an API issue.

- at least when apps are in foreground, in addition to polling, data
model updates should also be possible by events, with optional
low-pass filters controlled by the app. This is a classical
synchronization use case, and I would support both polling and
(filtered-)event-firing mechanisms in the API.

- what applications really need are data models for feeding their
views, e.g. roster/contact list with presence information, or call
history, or messaging conversation view, etc. Sources may be
heterogenous. So, apps need to maintain 'live' objects describing
their data model, and in case of big data, a high-fps viewport over
that data. This is non-trivial to solve for the generic case. On the
other hand, just providing data source API's and defer the solution to
the JS side may not be enough either, since certain optimizations can
only be done across a full vertical. What is important here is the
freedom of choice for developers.

- the W3C API's will also be used in products which primarily support
native apps for call, messaging and contacts, have a database on
native side, with enough optimizations that they could expose their
native data models on JS side efficiently so that JS apps could access
the same data, for any generic purpose. Under these conditions,
generic filters from JS would work well and we should not prevent
developers from doing so. This is a valid use case, too.

Now the question is: could we support both use cases? (please :)

One of my drafts for CallHistory looks like this (I replace
CallHistory with DataSync for this example). It is an asynchronous
API.

interface DataSync : EventTarget {
     attribute DataEntry[] data;
     attribute DataEntry[]? added;
     attribute DataEntry[]? changed;
     attribute DOMString[]? deletedUIDs;

     boolean autoUpdate; // false: manual pull mode (default), true:
automatic updates
     double   frequency;   // auto-update max frequency, 0 meaning no
constraints

     void pullSync();
     void pushSync();
     void stopSync();

     attribute FilterOptions?  filterOptions;
     attribute AbstractFilter? filter;
     attribute DataEntry[]? filtered;

     attribute EventHandler onadded;
     attribute EventHandler onchanged;
     attribute EventHandler ondeleted;
     attribute EventHandler onfinished;
     attribute EventHandler onerror;
};

How does this work in Jonas' use case (data replicated to e.g. IndexedDB):

ds = new DataSync();
ds.autoUpdate = false;
ds.added = null;
ds.changed = null;
ds.deleted = null;
ds.filter  = null; // we won't use filters now
ds.pullSync();
// events start to come for added, changed, deleted data entries since
last call of startSync (note there is a critical section there)
// any time user can stop by calling stopSync(), which updates the
sync window cursor (for next startSync)
// when sync has finished, 'onfinished' handler is called
// note that filters could be theoretically used too, and of course
they can be implemented on JS side ;)

When the client has finished reading the 'added' data, resets the
object to null, acknowledging the data sync on added items. Similarly
for changed items, and deleted UID's. Unless the reference is null,
sync won't touch it.

When the client changes or deletes any data, it must be synced back:

ds.changed = arrayOfChangedItems;
ds.added = null;
ds.deleted = arrayOfDeletedUIDs;
ds.filter = null;
ds.pushSync();

How would this work with generic filters and automatic updates?

ds.frequency = 0;  // updates come immediately when available
ds.autoUpdate = true; // triggers auto-updates
ds.filterOptions = mySortOrder;
ds.filter = myFilter; // can be any composite or attribute or range filter
ds.startSync("pull");  // will trigger trigger a query that will
update the 'filtered' array
 //when completed, de 'onfinished' event handler is called
 // can be stopped by calling stopSync();

This was rough and quick, and there may be errors in my thinking, but
I hope that a similar design would work with both JS-only and hybrid
designs. What do you think, would this or something similar be an
acceptable-compromise design?

Best regards,
Zoltan


On Fri, Nov 9, 2012 at 10:41 PM, Jonas Sicking <jonas@sicking.cc> wrote:
> A concern with the current approach is that performance matters a
> whole lot to developers.
>
> For example when opening a particular part of an app which wants to
> display a contact list to the user, the application will need to query
> the contacts API and get the list of contacts before they can be
> displayed. If the result takes too long to come back, the result can
> easily be that the API simply doesn't fulfill the developers needs.
> I.e. even if the API comes back with a "correct" result, if it takes
> too long to get that result the API can be essentially non-functioning
> from a developers point of view.
>
> This is a very fuzzy area from a W3C API point if view. Most of the
> specs that we've developed in the WebApps WG has been developed
> without any forms of guarantees regarding performance. Though in
> reality developers have come to expect certain performance
> characteristics from the implementations anyway. I believe for example
> that most implementations have made iterating a child-list of nodes a
> O(n) operation both when iterated using Node.firstNode/.nextSibling or
> when using Node.childNodes[n]. This requires quite a bit of
> implementation complexity but it's something that all developers found
> necessary to do.
>
> This becomes especially important when designing database APIs, like
> the contacts API. In a database, it's incredibly important if an
> operation is O(n), O(log n) or O(1) since you can be dealing with many
> more records than you normally deal with in a DOM.
>
> This is especially important on mobile devices where CPU and IO
> performance is often severely limited.
>
> We ran into this recently in Firefox OS where some of our apps weren't
> displaying UI to the user fast enough because the SMS and Contacts
> APIs weren't returning results fast enough.
>
> This was something that we had in mind when designing IndexedDB. It
> was specifically designed such that all operations could be
> implemented with predictable performance characteristics. I think this
> is something that we need to continue doing for all APIs which are
> essentially wrappers around databases, like Contacts API or SMS.
>


> SQL is a good example. While SQL does have the benefit that you can
> easily get the result that you want, many times that's simply not good
> enough for developers. People end up spending large amounts of effort
> getting indexes and expressions to line up correctly such that the
> execution engine in the database gives them the performance that they
> want.
>
>> Also note that if we don't provide flexible filtering then this means that
>> the app will likely need to do post-filtering by itself on JavaScript side.
>> I believe that even if the native backend is not optimized for all queries,
>> those are still going to be faster compared to the alternative
>> (post-filtering on JS side).
>
> Why? Do you have data to back this up?
>
> Javascript is really fast these days. Fast enough that we in Firefox
> OS are moving towards implementing more and more of our APIs in
> javascript. In fact, both our SMS API and our Contacts API is
> implemented in javascript.
>
> I would actually make the opposite argument. That if an implementation
> can't implement a particular filter in such a way that it gets
> inherent benefits, usually by using indexes, then I think its better
> to rely on application logic written in javascript to accomplish the
> same thing.
>
> Providing more flexible filters as just a convenience function is
> essentially just syntactic sugar. Syntactic sugar isn't inherently
> bad, but is something that should be added with care, especially in
> the first version of the API.
>
> Another thing to keep in mind is that no matter how much syntax sugar
> we add, we are unlikely to fully support all the things that people
> need to do for even the most simple UIs. For example, the current API
> doesn't seem to make it possible to render a list of letters which
> there are user names that start with. Or make it possible to get the
> total number of contacts as to render a scrollbar of the correct size.
> Or implement a quick-search feature like "give me all the contacts
> where one of the properties contain the string X" At least not without
> doing very slow full table scans.
>
> However, rather than just whining, let me make a counter proposal.
>
> The first thing that I think we should do is to make it possible for
> applications to keep their own database which is "caching" the
> information that they need. To do this, we need to make it possible
> for the app to find out about all modifications that are done to the
> contacts database. I.e. it needs to be able to be informed about any
> contact which is added, removed or modified.
>
> There are many ways we can do this. A somewhat naive solution is to
> have system-message-like callbacks for any time a modification is done
> to the database. This callback could then provide information about
> which contact was added/removed/modified and if modified, which fields
> were changed. However the downside of this approach is that we'll be
> constantly waking up applications in the background which will result
> in a lot of battery usage.
>
> A better solution might be to introduce the ability for an app to say
> "give me a list of all changes that happened since the last time I
> asked". That way the application can on first startup scan through the
> contacts database and cache any information that it wants, it would
> then say "please start recording changes for me". On next startup it
> would simply get the list of changes since last run and update its
> cache based on that. The API implementation would then queue any
> modifications that are made and only drop those modifications once all
> applications that have said "please record" has received the record.
>
> To make it more concrete, it could look something like this:
>
> interface ContactsManager {
>   ...
>   startTrackingChanges();
>   ContactRequest getNewChanges();
>   attribute EventHandler onchangesavailable;
>   stopTrackingChanges(); // Not strictly needed, but seems like a good idea
>   ...
> };
>
> We'd have to have some way of dealing with the situation that we end
> up with a lot of changes getting queued up and the system wanting to
> dump them. An easy solution would be to all getNewChanges to return
> something indicating "i've lost track of your changes, please do a
> full rescan". That is very likely quite good enough since it'll only
> happen for apps that aren't run very often.
>
> Another solution is to introduce a way for the system to send a system
> message indicating "i'm overflowing with changes, please request the
> change-list". This can be done in a backwards compatible way, so
> something we can hold off on for now if we want to keep things simple.
>
> Once we have something like this, and once we are fine with relying on
> application-side javascript code to do filters that aren't using
> special indexes, I think that greatly reduces the pressure to add a
> ton of features to the API for advanced filters. It doesn't mean that
> we shouldn't have *any* filters, but it means that we can get away
> with much simpler ones without affecting the type of applications
> people can build.




On Fri, Nov 9, 2012 at 10:41 PM, Jonas Sicking <jonas@sicking.cc> wrote:
> On Wed, Nov 7, 2012 at 1:45 AM, Dumez, Christophe
> <christophe.dumez@intel.com> wrote:
>> On Wed, Nov 7, 2012 at 11:03 AM, EDUARDO FULLEA CARRERA <efc@tid.es> wrote:
>>> I have been comparing the two proposals on the table for Contacts API, and
>>> they happen to have more similarities than differences what is good news in
>>> order to reach a consolidated proposal. Let me comment on the main
>>> differences:
>>>
>>> -Filtering: Intel’s proposal allows applying virtually any type of filter
>>> as it allows the definition of composite filters. Note though that in order
>>> to have an appropriate performance the runtime will need to define indexes
>>> for the different filtering operations. Having so flexible filtering implies
>>> creating a virtually unlimited set of indexes. Christophe, what is the
>>> performance you expect the implementation to have? and how do you propose to
>>> solve that issue?
>>
>> Well, it is a good idea for the backend to define indexes to make the common
>> filtering operations faster. I fail to see the relation between having a
>> flexible filtering system and creating a virtually unlimited set of indexes.
>> In my opinion, just because the backend cannot optimize for all possible
>> queries does not mean that we should restrict which queries are supported by
>> the API.
>
> A concern with the current approach is that performance matters a
> whole lot to developers.
>
> For example when opening a particular part of an app which wants to
> display a contact list to the user, the application will need to query
> the contacts API and get the list of contacts before they can be
> displayed. If the result takes too long to come back, the result can
> easily be that the API simply doesn't fulfill the developers needs.
> I.e. even if the API comes back with a "correct" result, if it takes
> too long to get that result the API can be essentially non-functioning
> from a developers point of view.
>
> This is a very fuzzy area from a W3C API point if view. Most of the
> specs that we've developed in the WebApps WG has been developed
> without any forms of guarantees regarding performance. Though in
> reality developers have come to expect certain performance
> characteristics from the implementations anyway. I believe for example
> that most implementations have made iterating a child-list of nodes a
> O(n) operation both when iterated using Node.firstNode/.nextSibling or
> when using Node.childNodes[n]. This requires quite a bit of
> implementation complexity but it's something that all developers found
> necessary to do.
>
> This becomes especially important when designing database APIs, like
> the contacts API. In a database, it's incredibly important if an
> operation is O(n), O(log n) or O(1) since you can be dealing with many
> more records than you normally deal with in a DOM.
>
> This is especially important on mobile devices where CPU and IO
> performance is often severely limited.
>
> We ran into this recently in Firefox OS where some of our apps weren't
> displaying UI to the user fast enough because the SMS and Contacts
> APIs weren't returning results fast enough.
>
> This was something that we had in mind when designing IndexedDB. It
> was specifically designed such that all operations could be
> implemented with predictable performance characteristics. I think this
> is something that we need to continue doing for all APIs which are
> essentially wrappers around databases, like Contacts API or SMS.
>
>> If you consider SQL, you can query pretty much anything even if some of
>> these queries may be slow. All we can do is define indexes in the database
>> to optimize the most frequent/common queries.
>> I think the same thing applies here. The addressbook is basically a layer on
>> top of the database and the filtering system is merely a layer on top of
>> whatever query language is being used by the backend (e.g. SQL).
>
> SQL is a good example. While SQL does have the benefit that you can
> easily get the result that you want, many times that's simply not good
> enough for developers. People end up spending large amounts of effort
> getting indexes and expressions to line up correctly such that the
> execution engine in the database gives them the performance that they
> want.
>
>> Also note that if we don't provide flexible filtering then this means that
>> the app will likely need to do post-filtering by itself on JavaScript side.
>> I believe that even if the native backend is not optimized for all queries,
>> those are still going to be faster compared to the alternative
>> (post-filtering on JS side).
>
> Why? Do you have data to back this up?
>
> Javascript is really fast these days. Fast enough that we in Firefox
> OS are moving towards implementing more and more of our APIs in
> javascript. In fact, both our SMS API and our Contacts API is
> implemented in javascript.
>
> I would actually make the opposite argument. That if an implementation
> can't implement a particular filter in such a way that it gets
> inherent benefits, usually by using indexes, then I think its better
> to rely on application logic written in javascript to accomplish the
> same thing.
>
> Providing more flexible filters as just a convenience function is
> essentially just syntactic sugar. Syntactic sugar isn't inherently
> bad, but is something that should be added with care, especially in
> the first version of the API.
>
> Another thing to keep in mind is that no matter how much syntax sugar
> we add, we are unlikely to fully support all the things that people
> need to do for even the most simple UIs. For example, the current API
> doesn't seem to make it possible to render a list of letters which
> there are user names that start with. Or make it possible to get the
> total number of contacts as to render a scrollbar of the correct size.
> Or implement a quick-search feature like "give me all the contacts
> where one of the properties contain the string X" At least not without
> doing very slow full table scans.
>
> However, rather than just whining, let me make a counter proposal.
>
> The first thing that I think we should do is to make it possible for
> applications to keep their own database which is "caching" the
> information that they need. To do this, we need to make it possible
> for the app to find out about all modifications that are done to the
> contacts database. I.e. it needs to be able to be informed about any
> contact which is added, removed or modified.
>
> There are many ways we can do this. A somewhat naive solution is to
> have system-message-like callbacks for any time a modification is done
> to the database. This callback could then provide information about
> which contact was added/removed/modified and if modified, which fields
> were changed. However the downside of this approach is that we'll be
> constantly waking up applications in the background which will result
> in a lot of battery usage.
>
> A better solution might be to introduce the ability for an app to say
> "give me a list of all changes that happened since the last time I
> asked". That way the application can on first startup scan through the
> contacts database and cache any information that it wants, it would
> then say "please start recording changes for me". On next startup it
> would simply get the list of changes since last run and update its
> cache based on that. The API implementation would then queue any
> modifications that are made and only drop those modifications once all
> applications that have said "please record" has received the record.
>
> To make it more concrete, it could look something like this:
>
> interface ContactsManager {
>   ...
>   startTrackingChanges();
>   ContactRequest getNewChanges();
>   attribute EventHandler onchangesavailable;
>   stopTrackingChanges(); // Not strictly needed, but seems like a good idea
>   ...
> };
>
> We'd have to have some way of dealing with the situation that we end
> up with a lot of changes getting queued up and the system wanting to
> dump them. An easy solution would be to all getNewChanges to return
> something indicating "i've lost track of your changes, please do a
> full rescan". That is very likely quite good enough since it'll only
> happen for apps that aren't run very often.
>
> Another solution is to introduce a way for the system to send a system
> message indicating "i'm overflowing with changes, please request the
> change-list". This can be done in a backwards compatible way, so
> something we can hold off on for now if we want to keep things simple.
>
> Once we have something like this, and once we are fine with relying on
> application-side javascript code to do filters that aren't using
> special indexes, I think that greatly reduces the pressure to add a
> ton of features to the API for advanced filters. It doesn't mean that
> we shouldn't have *any* filters, but it means that we can get away
> with much simpler ones without affecting the type of applications
> people can build.
>
>>> -ContactChangeEvent: Intel's proposal allows notifying changes in multiple
>>> contacts simultaneously indicating which contacts have been deleted, created
>>> and modified. Is the system expected to wait for some time before firing an
>>> event with multiple changes or do these notifications of multiple changes
>>> apply just to really simultaneous changes (e.g. sync with an external
>>> address book).
>>
>> We were thinking about database transactions in the backend. For those, I
>> think it is a good idea to return all the changes in the same event, this is
>> better for performance that firing one event per change.
>> The idea is also to have a generic ContactChange event which handles all
>> possible Addressbook changes in one object.
>
> I'm not sure I understand the answer here. The question is: In what
> situations would the implementation be able to deliver a list of
> changes, rather than deliver the changes individually?
>
> It seems like your answer is that you'd deliver a list rather than
> individual changes whenever you have a transaction in the backend that
> changes multiple things at once. However then the question simply
> becomes: In what situations would the implementation have a
> transaction which change multiple things at once?
>
> I.e. can this happen when another app is using the contacts API to
> modify the database? Or can this only happen if someone is modifying
> the database through means other than the contacts API?
>
> I'm not necessarily opposed the ability to notify about several
> changes at once, but I'd like to understand better how it works.
>
>>> -getSimContacts(): Our proposal included this method to be able to
>>> retrieve the contacts from the SIM card (e.g. to copy them to the device
>>> address book). It is not the intention to deal in any other way with the
>>> SIM’s address book, so neither the rest of the methods (find, remove, etc)
>>> nor the ContactChangeEvent apply in principle to SIM contacts.
>>
>> We removed this method because it seemed too specific. The device may not
>> have a SIM card and I did not understand why we needed to handle SIM
>> contacts any differently at API level.
>> I don't have strong feeling about this one but I thought that the
>> Contact.readonly flag would allow us to achieve the same thing (Contacts can
>> be retrieved from SIM using find() but cannot be removed or saved because
>> they are marked as 'readOnly').
>
> We need this function *somewhere*. But I'm fine with it living
> somewhere other than in the Contacts API. Especially for now.
>
>>> -read-only contacts in Intel’s proposal: Christophe, could you elaborate
>>> what is the use case for that?
>>
>> In some cases, the backend is not able to edit specific contacts. It can be
>> useful for SIM contacts for example. In case we are only able to read those,
>> not edit them.
>> It can also be useful for contacts coming from online services which may not
>> support editing.
>
> As a user, this seems extremely annoying. If something shows up in my
> address book I'd expect to be able to modify it. I suspect that if we
> add this, people will simply write apps that if a contact is read-only
> the app simply copies the information over to a new, editable, contact
> and deletes the read-only one.
>
> I don't think the API as written could work with a sim-card backend
> anyway so I'm not really worried about that use-case.
>
>>> -Additional fields in Intel's proposal: relationship, phoneticGivenName;
>>> phoneticFamilyName; favorite; ringtone
>>
>> Yes, we introduced those attributes because we had them in Tizen and they
>> are common vCard extensions. They also have practical use cases for the
>> AddressBook application.
>>
>>>
>>> -Additional fields in our proposal: genderIdentity; carrier attribute in
>>> tel field
>>
>> Yes,
>> - GenderIdentity is part of vCard 4.0 but we could not find much use for it
>> (we already have "gender" attribute).
>> - carrier is not part of the vCard standard AFAIK and seemed very specific.
>> we were not sure this should be part of the standard API. The application
>> could easily handle this by itself if needed.
>
> I'll defer to Tantek for contact field questions.
>
> FWIW, the ringtone field seems just as application specific to me as
> the carrier field.
>
> This is also another argument for introducing the ability for
> applications to keep their own app-local cache. That way they can in
> that cache add any fields that they need, without having to convince
> us or the vCard community that their field is important enough that it
> should be added to the worlds contact formats.
>
> / Jonas
>
> ---------------------------------------------------------------------
> Intel Finland Oy
> Registered Address: PL 281, 00181 Helsinki
> Business Identity Code: 0357606 - 4
> Domiciled in Helsinki
>
> This e-mail and any attachments may contain confidential material for
> the sole use of the intended recipient(s). Any review or distribution
> by others is strictly prohibited. If you are not the intended
> recipient, please contact the sender and delete all copies.
>
Received on Saturday, 10 November 2012 14:34:58 UTC