Re: Replacing WebSQL with a Relational Data Model. from Nathan on 2010-10-26 (public-webapps@w3.org from October to December 2010)

From: Nathan <nathan@webr3.org>
Date: Wed, 27 Oct 2010 00:20:57 +0100
To: Keean Schupke <keean@fry-it.com>
CC: public-webapps@w3.org, Jonas Sicking <jonas@sicking.cc>, Arthur Barstow <art.barstow@nokia.com>
Message-ID: <4CC76259.3030303@webr3.org>
Keean Schupke wrote:
>> Likewise, personally all data I use is either in graph or k/v format, and
>> have found (like many others), that primary speed gains come from changing
>> the underlying data model rather than trying to map forwards and backwards
>> between objects, graphs and tabular data in a relational data model, via
>> SQL.
> 
> If we see HTML5 as a replacement for native mobile apps, and desktop apps,
> then we have to look at what applications require in terms of data storage
> not websites.

They require web storage, using the web as a data tier, if we want our 
client side html applications to run anywhere, then it stands to reason 
that the data used in those applications must be accessible from 
anywhere, thus we separate the applications from storage. [1]

> SQL is very widely used, and that must tell us something.
> Nearly all our apps use it.

TBH, I think that tells us that you use SQL.

> Lets think about what is currently done on the server. Lots of server apps
> use a relational database. With HTML5 we would like to move as much of that
> work as possible to the client - so that it uses the client's CPU not the
> servers CPU.

Moving the data to the processor, rather than the processor to the data, 
is a design feature of distributed hypermedia systems [2], this is also 
one of the many benefits gained by the application of the separation of 
concerns principal to the web, which we gain from the client-server 
architecture of HTTP. By separating the user interface constrains from 
the data storage concerns, we improve the portability of the user 
interface across multiple platforms and improve scalability by 
simplifying the server components. [3]

Net result is that often you can publish structured data in a granular 
fashion on the web, with an appropriate schema, and pull it down to the 
application to process. Often this also separates the applications from 
the data (as well as the storage) which allows users to both be in full 
control of their data, and to use the same data with multiple different 
applications - see also [1].

> The HTML5 app will keep a local cache of some of the server database for
> offline use.

Indeed, caching can partially or completely eliminate some client-server 
interactions, and it improves efficiency, scalability and user perceived 
performance of the application (and the web) [4]. Caching (along with 
stateless interactions) are very important parts of successful 
application design, and the web, being HTTP cache friendly (ETags, Last 
Modified etc) together with Application Cache Manifests [5] is a very 
nice recipe.

I guess the issue of exactly how you cache data on the client side 
varies based on the application, however in all but a few cases it 
appears to me that the optimal solution is to store the data in a 
structure/format which the application actually uses. A common example 
may be storing a User Object (complete with nested objects) rather than 
a several tables each with a component part in it.

A wise man recently said to me that the most efficient system is one 
which can answer a question before it's been asked.

> For example a calendar app that keeps the next months appointments in the
> client, and allows new meetings to be created offline. There may be many
> ways you wish to seach and cross reference the information, for example you
> may have a contacts table and an appointments  table. You may use a join to
> allow the contacts phone number to be displayed in each appointment.
> Searching, sorting, grouping, all things you will want to do.

This is a very good example! You may well be interested in having a 
quick look at the "Motivating Writing" section of [6] which outlines 
just such an application, and that document together with [7] covers a 
slightly different approach to getting that contacts phone number 
displayed alongside each appointment, using Links rather than Joins.

>> However, over and above any issues of speed or this-vs-that, I strongly
>> feel that universality should be priority number one, what all of us *need*
>> is a /single/ standardized storage interface implemented everywhere
>> javascript is, and certainly in all the major browsers.
>> 
>> Keean, I think what you are proposing is great, but should be an API
>> layered on top of IndexedDB, and which uses IndexedDB - the fact is we're
>> dealing with javascript here, focus has to be on handling the structures
>> provided (object/array) and getting them in and out of a persistence layer
>> as optimally as possible.
>>
>> Thus, once we have this single universal persistence interface we can wrap
>> it up in other model or context specific APIs (such as the one you
>> describe), lessons can be learned, bugs filed, and over time speed increases
>> and optimizations will be introduced.
> 
> I am going to attempt a basic implementation on top of IndexedDB, and also
> on top of WebSQL.

Glad to hear it :) the relational API certainly sounds useful.

>> The critical point here is that we need only one standardized interface,
>> not a perfectly optimized for data-model-x one, not a uses
>> query-language-foo one, just something that we can all use to persist data
>> from javascript, and wrap in other APIs, that way any optimizations made
>> will benefit everybody - regardless of their preferred interface, data model
>> & query style.
> 
> The problem is the proposed data model is a subset of the functionality
> required. The relational model is a complete algebra (hence the term
> relationally complete). With a non-relationally complete storage API there
> will always be limitations to what it can do.

Indeed, as mentioned earlier I primarily use data in a graph/triple 
model, so neither WebSQL or IndexedDB are "perfect" for my needs, 
however IndexedDB is more generic and optimized for the structures 
provided by ECMAScript, whereas as SQL is optimized for.. well SQL and 
RDBMS - as you noted yourself (and from what I understand, do) a 
relationally complete API could be built on top of IndexedDB, as could 
an SQL query engine, a SPARQL query engine, or.. well anything you can 
think of I guess, relationally complete or not.

So, my point was that although IndexedDB is neither optimal for your 
preferred data model or mine, it does cater for us both, and everybody 
else, allowing us to get on and do our jobs, implement APIs, and build 
HTML5 client side web applications.

> Take Firefox for example, it implements IndexedDB using SQLite apparently.
> So implementing a relational API if we have to talk to IndexedDB that means
> we have to convert from the relational data model to an object model and
> then back to a relational model for SQLite. So what I would like to do is
> punch through that excess layer in the middle and have the relational API
> talk directly to SQLite in the browser implementation. How could you argue
> that having an unnecessary middle layer is a good thing?

I'd actually argue that the only necessary /is/ that middle layer, 
that's the interface, and the whole point of interfaces, especially 
standardized ones, is that you forget the implementation specifics and 
code to the interface. ECMAScript is in a sense just an interface, you 
code to the programming interface it provides and it doesn't matter 
whether the code gets interpreted and executed by Java or C++, on 
windows or mac, in opera or on the command line; the results are 
expected and the interface is known.

So no it's not optimal for specific cases, neither is the DOM, neither 
is IndexedDB, and neither is the web - but its the universality aspect 
and precisely such interfaces which make the web what it is, and why the 
more generic optimized for ECMAScript solution gets my vote. Even though 
it's not a perfect fit for my own needs.

> Of course I am not suggesting that this is more important than getting
> IndexedDB working (for me a good relational API is more important, but I can
> understand that for a lot of people IndexedDB is all they need).

It's not all they need, but it allows a platform upon which all they 
need can be built, including a good relational API :)

> What I would like to see is the relational API standardised (maybe in the
> next version of IndexedDB, maybe as a separate standard), so there is a
> relationally complete API available, that browser implementers can implement
> using a relational storage engine like SQLite directly, or can be emulated
> by an implementation on top of IndexedDB where necessary.

Sounds good, certainly having the API and an implementation there will 
have countless benefits for us all.

Best,

Nathan

[1] http://www.w3.org/DesignIssues/CloudStorage.html
[2] 
http://www.ics.uci.edu/~fielding/pubs/dissertation/rest_arch_style.htm#sec_5_2_1
[3] 
http://www.ics.uci.edu/~fielding/pubs/dissertation/rest_arch_style.htm#sec_5_1_2
[4] 
http://www.ics.uci.edu/~fielding/pubs/dissertation/rest_arch_style.htm#sec_5_1_4
[5] http://www.w3.org/TR/html5/offline.html
[6] http://www.w3.org/DesignIssues/ReadWriteLinkedData.html
[7] http://www.w3.org/DesignIssues/LinkedData.html
Received on Tuesday, 26 October 2010 23:22:03 UTC