Re: Data models and filtering (was: Re: Contacts API draft)

On Sat, Nov 10, 2012 at 4:34 PM, Kis, Zoltan <zoltan.kis@intel.com> wrote:
> Hello,
>
> I think this topic is worth a separate thread. The problem is not only
> characteristic to Contacts: I am also facing the same issues with Call
> History and Messaging.
>
> Currently I am using the same filters as Chris in these API's.
> However, Jonas raised a lot of good points. I propose we have a
> common-to-all-API's discussion/solution for this topic. Of course the
> Contacts thread can continue on its own, I hijacked the original
> thread (scroll down for it) only for this particular issue. :).
>
> A few key points from Jonas were:
> - performance is key issue, and JS-only implementations are valid use cases
> - there is doubt that a general purpose query API is performant enough
> (in general) for the above use case, in order to build data models for
> apps at a rate required by the app
> - therefore there must be a method to cache or sync data (or a full
> database) at JS side in order to control performance-critical
> operations from there
> - there was a proposal for this sync is done by polling delta's from
> data sources.
>
> I agree with these, with the following notes/additions:
> - some data, e.g. anything to do with close-to-real-time constraints
> must be managed from the native side - or the same side where the
> protocols are handled -, (e.g. call history, otherwise we risk losing
> incoming calls), and only _exposed_ to JS side.
> One could say that in these cases the middleware could maintain a
> local sync cache, which is erased after syncing to JS side, so this
> can be solved. However, there may be multiple clients for the data,
> which can be native or JS or both, so the requirement is that if an
> implementation choses to replicate/sync data to the JS side, must keep
> it in 2-way sync with the origin of the data (i.e. changes done on JS
> side are synced back to native side).
>
> - the performance bottleneck you mentioned is usually not in the
> native database itself (key-value store, SQL DB, semantic DB,
> distributed heterogenous DB, etc). In most cases, the bottleneck is in
> the way query results or data models are exposed to the JS side. A DB
> implementation in JS will certainly not be faster per se than a native
> implementation of similar design, but having it on JS side may be more
> efficient because the eventual IPC bottleneck (which is not mandatory
> to have, either). So I claim this performance issue is an
> implementation issue, and not an API issue.
>
> - at least when apps are in foreground, in addition to polling, data
> model updates should also be possible by events, with optional
> low-pass filters controlled by the app. This is a classical
> synchronization use case, and I would support both polling and
> (filtered-)event-firing mechanisms in the API.
>
> - what applications really need are data models for feeding their
> views, e.g. roster/contact list with presence information, or call
> history, or messaging conversation view, etc. Sources may be
> heterogenous. So, apps need to maintain 'live' objects describing
> their data model, and in case of big data, a high-fps viewport over
> that data. This is non-trivial to solve for the generic case. On the
> other hand, just providing data source API's and defer the solution to
> the JS side may not be enough either, since certain optimizations can
> only be done across a full vertical. What is important here is the
> freedom of choice for developers.
>
> - the W3C API's will also be used in products which primarily support
> native apps for call, messaging and contacts, have a database on
> native side, with enough optimizations that they could expose their
> native data models on JS side efficiently so that JS apps could access
> the same data, for any generic purpose. Under these conditions,
> generic filters from JS would work well and we should not prevent
> developers from doing so. This is a valid use case, too.
>
> Now the question is: could we support both use cases? (please :)
>
> One of my drafts for CallHistory looks like this (I replace
> CallHistory with DataSync for this example). It is an asynchronous
> API.
>
> interface DataSync : EventTarget {
>      attribute DataEntry[] data;
>      attribute DataEntry[]? added;
>      attribute DataEntry[]? changed;
>      attribute DOMString[]? deletedUIDs;
>
>      boolean autoUpdate; // false: manual pull mode (default), true:
> automatic updates
>      double   frequency;   // auto-update max frequency, 0 meaning no
> constraints
>
>      void pullSync();
>      void pushSync();
>      void stopSync();
>
>      attribute FilterOptions?  filterOptions;
>      attribute AbstractFilter? filter;
>      attribute DataEntry[]? filtered;
>
>      attribute EventHandler onadded;
>      attribute EventHandler onchanged;
>      attribute EventHandler ondeleted;
>      attribute EventHandler onfinished;
>      attribute EventHandler onerror;
> };
>
> How does this work in Jonas' use case (data replicated to e.g. IndexedDB):
>
> ds = new DataSync();
> ds.autoUpdate = false;
> ds.added = null;
> ds.changed = null;
> ds.deleted = null;
> ds.filter  = null; // we won't use filters now
> ds.pullSync();
> // events start to come for added, changed, deleted data entries since
> last call of startSync (note there is a critical section there)
> // any time user can stop by calling stopSync(), which updates the
> sync window cursor (for next startSync)
> // when sync has finished, 'onfinished' handler is called
> // note that filters could be theoretically used too, and of course
> they can be implemented on JS side ;)
>
> When the client has finished reading the 'added' data, resets the
> object to null, acknowledging the data sync on added items. Similarly
> for changed items, and deleted UID's. Unless the reference is null,
> sync won't touch it.
>
> When the client changes or deletes any data, it must be synced back:
>
> ds.changed = arrayOfChangedItems;
> ds.added = null;
> ds.deleted = arrayOfDeletedUIDs;
> ds.filter = null;
> ds.pushSync();
>
> How would this work with generic filters and automatic updates?

Correction of the example below.

>
> ds.frequency = 0;  // updates come immediately when available
> ds.autoUpdate = true; // triggers auto-updates
> ds.filterOptions = mySortOrder;
> ds.filter = myFilter; // can be any composite or attribute or range filter
// setting the filter will trigger trigger a query that will update
the 'filtered' array
// can be stopped by setting the filter to null.
>  //when completed, de 'onfinished' event handler is called

Although the pullSync()/stopSync() methods could also be used, it is
more clear to reserve them for using with the sync itself.
Filtering can be contained separately, in fact it could also be in a
different interface altogether, for instance (and now with methods):

interface FilteredData : EventTarget {
     void startFiltering(AbstractFilter filter, optional FilterOptions
filterOptions);
     void stopFiltering();

     readonly attribute DataEntry[] filtered;
     readonly attribute DOMError error;

     attribute EventHandler onfinished;
     attribute EventHandler onerror;
};

It would work almost as above. Modifications of filtered entries can
be done via the (now cleaner) DataSync interface.

I forgot to mention about DataSync that the problem of insertion (who
generates the UID's when entries are added) can be solved by the
implementations by separating the namespaces in DataEntry, i.e.
maintaining 2 UID attributes, one local, and one remote, for example:

interface DataEntry {
     DOMString uid;
     DOMString remoteUid;
     // and other attributes
};

All interfaces using DataSync (Contacts, CallHistory, Messaging, etc)
would inherit (implement) DataSync, and their data classes like
CallHistoryEntry would inherit DataEntry.

All these names are just examples, and all the above interfaces serve
only as an illustration to my points. Feel free to come up with better
interfaces.

Best regards,
Zoltan

Received on Monday, 12 November 2012 06:49:53 UTC