Web Storage & SQL

(I originally blogged this at 
http://blog.vlad1.com/2009/04/06/html5-web-storage-and-sql/, but Hixie 
rightfully pointed out that I should post it here for discussion -- 
doing so!  Blog post is copied pretty much verbatim below, so apologies 
if it sounds more blog-y than post-y.)

There's been a lot of interest around the Web Storage spec (formerly 
part of whatwg HTML5), which exposes a SQL database to web applications 
to use for data storage, both for online and offline use.  It presents a 
simple API designed for executing SQL statements and reading result 
rows.  But there's an interesting problem with this; unlike the rest of 
the HtML5, this section defines a core piece of functionality in terms 
of an undefined chunk referenced as "SQL".

The initial implementations of Web Storage are both based on SQLite, and 
expose the dialect of SQL understood by SQLite to web content.  I'm 
actually a big fan of SQLite, and was one of the advocates for pulling 
it into the Gecko platform.  However, SQLite implements a variant of 
SQL, with a number of deviations from other SQL engines, especially in 
terms of the types of data that can be placed in columns.

Web content that is created to use database storage with SQLite as the 
backing is unlikely to work with any other backend database.  Similarly, 
if another database was chosen as a browser's backing implementation, 
web content that works with it is unlikely to work with anything else. 
This is a serious interop problem, the root of which is that there 
really isn't a useful core SQL standard.  SQL92 is generally taken as a 
base, but is often extended or altered by implementations.  Even beyond 
the parser issues (which could be resolved by defining a strict syntax 
to be used by Web Storage), the underlying implementation details will 
affect results.

So, the only option is for the Web Storage portion of the spec to state 
"do what SQLite does".  This isn't specified in sufficient detail 
anywhere to be able to reimplement it from the documents, so it would be 
even worse — "do what this exact version of SQLite does", because there 
are no guarantees that SQLite won't make any incompatible changes.  For 
example, a future SQLite 4 may introduce some changes or some new syntax 
which wouldn't be supported by earlier versions.  Thus, it requires 
every single browser developer to accept SQLite as part of their 
platform.  This may not be possible for any number of reasons, not the 
least of which is it essentially means that every web browser is on the 
hook for potential security issues within SQLite.

Instead of all of this, I think it's worth stepping back and consider 
exactly what functionality web developers actually want.  It's certainly 
much easier to say "well, server developers are used to working with 
SQL, so let's just put SQL into the client", but it's certainly not 
ideal — most people working with SQL tend to end up writing wrappers to 
map their database into a saner object API.

So, I would propose stepping back from Web Storage as written and 
looking at the core pieces that we need to bring to web developers.  I 
believe that the solution needs to have a few characteristics.  First, 
it should be able to handle large data sets efficiently; in particular, 
it should not require that the entire data set fit into memory at one 
time.  Second, it should be able to execute queries over the entire 
dataset.  Finally, it should integrate well with the web, and in 
particular with JavaScript.

With these needs in mind, I think there are other options that should be 
considered, even beyond a subset of SQL; for example, an object-oriented 
database approach might serve those needs better.  A good prototype 
example of what such a system could look like is jLINQ, which implements 
client-side querying on JavaScript objects and arrays.  As such, a basic 
implementation is simple; more complex ones can have browser support for 
efficient indexing, triggers, rapid serialization to and deserialization 
from disk, etc.  An implementation could even map all of this on top of 
an underlying SQL engine.  Another option is something like CouchDB.  I 
was also just pointed at Persevere, which looks quite cool; much in the 
same way as CouchDB, the same API could be implemented both client-side 
and server-side, for efficient online/offline switching.  An approach 
such as one of these could well serve the web better than just throwing 
a SQL dialect over the web content fence.

This is a conversation that I think is worth having, both to figure out 
what could be done about the issues with directly exposing SQL/SQLite, 
and also to step back and explore alternate approaches to getting the 
same functionality in web developers' hands.

     - Vlad

Received on Wednesday, 8 April 2009 21:52:41 UTC