Re: Detailed review of 4.12. Client-side database storage from Mihai Sucan on 2007-10-12 (public-html@w3.org from October 2007)

From: Mihai Sucan <mihai.sucan@gmail.com>
Date: Fri, 12 Oct 2007 14:04:43 +0300
To: "Ian Hickson" <ian@hixie.ch>
Cc: public-html <public-html@w3.org>
Message-ID: <op.tz2zp5stmcpsjgr0b0dp@athlon>
Hello Ian!

Thanks for the reply.

Le Fri, 12 Oct 2007 08:34:33 +0300, Ian Hickson <ian@hixie.ch> a écrit:

> On Tue, 18 Sep 2007, Mihai Sucan wrote:
>>
>> I have reviewed section 4.12. "Client-side database storage" [1]. Here
>> are my comments:
>
> Thanks for the feedback.

You are welcome. More feedback will follow.

>> 1. In section 4.12.2. "Executing SQL statements" [2]:
>>
>> "Big Issue: There are two major missing features here: One: [...]. Two:
>> there's no session-specific API, so if you have two windows open at  
>> once, you
>> can't interact with the site doing two separate sessions unless the  
>> site goes
>> out of its way to track sessions itself, detecting when new tabs are  
>> opened,
>> etc. sessionStorage[] handles it, why doesn't this? [...]"
>>
>> Given there's sessionStorage, it wouldn't really be a big deal for a web
>> application to "go out of its way to track sessions itself" - since
>> sessionStorage allows this quite nicely and easily.
>
> Fair enough.
>
> We need to work out what our story is with sessionStorage and
> globalStorage, by the way. Having both them and the SQL storage API seems
> like overkill and bloat.

Yes and no.

Yes, because you can have all your global and even session storage within  
a single sqlStorage, and vice-versa. The only fundamental difference is  
that SQL is better suited for large amounts of work with data - better  
data management and storage altogether. globalStorage/sessionStorage are  
quite often preferred in Web applications where there's no need for large  
amounts of data - only for some application preferences, customizations,  
etc.

No, because, obviously, both are useful and... dare I say, needed.

Could they be combined? Probably yes.

Here's my proposal/idea/suggestion:

Change the specification, and base all the storage, on a single unified  
approach. Let all the storage data be in SQL databases.

Keep the current definition of the client-side database storage API, which  
allows developers to use SQL databases, and such. This the "raw" access to  
all the storage of a domain.

However, to satisfy the needs and use-cases for globalStorage and  
sessionStorage, define two basic tables within a single database. Let the  
database name be the 'storage' string. Let the first table be 'global'.  
Let the second table be 'session'. Now, define the format of the two  
tables such that Web authors have the same features/capabilities as they  
are now defined for globalStorage and sessionStorage APIs (key/value  
columns, and the domain column for globalStorage).

Next, redefine the globalStorage and sessionStorage APIs to be just  
"shorthands" to access the two SQL tables from the 'storage' database.  
Define which SQL queries are automatically generated.

Something like:

window.sessionStorage.length
SELECT COUNT(key) FROM storage.session

window.sessionStorage.getKey('example')
SELECT value FROM storage.session WHERE key='example'

window.globalStorage['example.com'].getKey('robod')
SELECT value FROM storage.global WHERE key='robod' AND domain='example.com'

... which returns the value, or raises a security exception if the script  
is not allowed to access the data from the given domain.

The above examples are for getting values. For setting values,  
INSERT/UPDATE statements need to be used.

I'd say this merger solution would reduce, to a certain extent, the  
"bloat" you are talking about. This would no longer require the UA to keep  
two separate storage "engines" (one for session/globalStorage, and another  
one for SQL storage). Everything would be unified into SQL storage.  
Additionally, the easy to use and convenient API of globalStorage and  
sessionStorage is kept.

I don't think it's ideal to completely eliminate  
globalStorage/sessionStorage - requiring everyone to use SQL. Nor is it  
ideal to remove SQL storage, which is very, very useful.


>> However, I was thinking: maybe UAs could allow the CREATE TEMPORARY
>> TABLE syntax (see the MySQL documentation [3]). The temporary tables
>> would be removed when the user terminates the session.
>>
>> This should be good enough for session-related SQL needs.
>
> That could work, though I'd feel better if we didn't have to futz in the
> SQL layer at this point.

That's not really futzing in the SQL layer. :)

I was just suggesting that the CREATE TEMPORARY TABLE syntax and  
functionality to be encouraged for sessionStorage-like capabilities within  
SQL. UAs can implement this, the spec doesn't need to define it - there  
are other specs which most likely already do so.


>> 2. The section 4.12.2 needs further clarification and details. Based on
>> the following questions:
>>
>> How are the "?" chars supposed to be replaced with the given arguments?
>> What kind of escaping is supposed to be expected?
>>
>> executeSql("INSERT INTO `flowers` (`name`, `color`) VALUES (?, ?)",  
>> "lily",
>> "white");
>> executeSql("INSERT INTO `flowers` (`name`, `color`) VALUES ('?', '?')",
>> "lily", "white");
>>
>> ... both cases could be expected in the wild. What to expect from each
>> example? I suppose the first example works as intended (a new row with
>> the given arguments is added). The second example will simply add a new
>> row with the '?' value for both fields. If this is the supposed
>> behavior, then the UA has "context-awareness": I cannot do things like:
>>
>> executeSql("INSERT INTO `flowers` (`?`, `?`) VALUES ('?', '?')", "name",
>> "lily", "color", "white");
>>
>> I cannot treat executeSql in a similar fashion to the sprintf() function
>> [4] - albeit this would be a much more simpler function.
>
> I've tried to clarify this. could you let me know if that's any better?

Yes, it's better now. Since I last checked, the entire section has been  
improved.

I'd recommend to add some examples for executeSql().


>> Can binary data be stored in the database? Is the UA supposed to do
>> proper "escaping" for binary data arguments?
>
> I'm not sure I follow. Could you elaborate?

Binary data, as in images, executables, videos, etc. can be inserted into  
mySQL databases (and into other SQL databases, obviously). They provide  
several field types which can handle hundreds of megabytes, even  
gigabytes, of binary data. See BLOBs [1]

I was asking, if future UAs should allow inserting binary data into tables.

1. I want to build a Web application which allows the user to do word  
processing offline.

2. I, as a Web developer, find it ideal to store all the documents within  
the SQL storage.

3. For the moment I have some concerns:

a) How can I allow the user to "upload" files (images, videos, sounds,  
archives, etc) into his/her documents without actually uploading the files  
to my server? *Offline* Web application.

b) How do I insert all this binary data into the SQL storage? Can I have  
something like:
executeSql('INSERT INTO `myfiles` (`name`, `data`) VALUES (?,?)',  
my_file_name, my_file_data)

?

c) There are other concerns as well: can JavaScript engines handle  
variables that have several megabytes of such data? Isn't that too  
memory-intensive? Or... could we have special FileObjects which don't  
actually have the files loaded into memory, but which can be passed to SQL  
queries?

d) Once the user goes online, how can I upload the files to the server?  
Without actually reading the entire file from the SQL storage into memory.  
Again, probably such fields should be some JS SqlBlobObjects which don't  
actually contain the entire blob, but they can be passed to input  
type=file for uploading to remote servers.

e) How about streaming the binary data to the remote server with the  
network connection API?


These are issues, which in my opinion need to be addressed by the HTML 5  
spec. For now, globalStorage, sessionStorage, SQL storage, and the network  
connection APIs do not scale to big/powerful Web applications. We need to  
be able to handle large files, lots of data. Otherwise, all of these are  
just nice features into which we can save settings.

Obviously, these issues cannot be addressed only in the section about SQL  
storage - a more global approach is needed (forms, DOM, JS, network API,  
SQL API, etc). However, this is where I noticed the greatest need for such  
large amounts of storage.


>> Also, regarding the SQL standard: I cannot ask for specifically
>> requiring the implementation of the SQL standard X, Y, nor Z. However, I
>> would suggest that the HTML5 spec *recommends* one SQL standard - like
>> it recommends Ogg support for the <video> element.
>
> In due course I think we'll specifically state what needs to be
> implemented. I'd rather have some implementation experience first,  
> though.

Good then.


>> 3. The ResultSet object [5] has the length attribute [6] which is now
>> defined to tell the number of columns in the current row. This attribute
>> name is quite confusing. I initially expected it tells the number of
>> rows in the ResultSet object.
>>
>> I would recommend that the "length" attribute be renamed to numCols or
>> numFields.
>
> This is now gone altogether.

Ok then.


>> 4. I was surprised to see there's no attribute (or method) to easily and
>> quickly check the number of rows in the ResultSet object. For now, one
>> either has to use COUNT() in a SELECT query, or must iterate over all
>> the rows to count them.
>>
>> I would recommend defining a new readonly attribute "numRows" which  
>> tells the
>> number of rows in the ResultSet object.
>
> We now have rows.length.

Good.


> Thanks for your feedback,

More feedback on the updated spec:

1. Why complicate things with database versions? Any use cases?

As I see, one can have database versions completely different,  
independent, of each other. Actually, database versions are not really  
"versions". They *can* be used as versions, but they can also be used as  
... something completely different.

It's just another layer of complication. mySQL doesn't have something like  
this. They do have, if I am not mistaken, something like "journaling" ...  
which allows developers to rollback in time. Yet, when we connect to a  
database we can't pick from multiple versions, which are actually  
independent databases (as defined now by the HTML5 spec).


2. Why openDatabase() instead of executeSql('SELECT databaseName') ?



That's about all for now.



[1] http://dev.mysql.com/doc/refman/4.1/en/blob.html


-- 
Mihai Sucan
http://www.robodesign.ro
Received on Friday, 12 October 2007 11:05:05 UTC