Database section feedback from Ian Hickson on 2008-11-26 (public-html@w3.org from November 2008)

From: Ian Hickson <ian@hixie.ch>
Date: Wed, 26 Nov 2008 10:54:41 +0000 (UTC)
To: HTML WG <public-html@w3.org>
Message-ID: <Pine.LNX.4.62.0811260736280.17414@hixie.dreamhostps.com>
(Note: The Local Storage section will be extracted from HTML5 at some 
point in the coming year.)

On Fri, 12 Oct 2007, Mihai Sucan wrote:
> 
> Change the specification, and base all the storage, on a single unified 
> approach. Let all the storage data be in SQL databases.
> 
> Keep the current definition of the client-side database storage API, 
> which allows developers to use SQL databases, and such. This the "raw" 
> access to all the storage of a domain.
> 
> However, to satisfy the needs and use-cases for globalStorage and 
> sessionStorage, define two basic tables within a single database. Let 
> the database name be the 'storage' string. Let the first table be 
> 'global'. Let the second table be 'session'. Now, define the format of 
> the two tables such that Web authors have the same features/capabilities 
> as they are now defined for globalStorage and sessionStorage APIs 
> (key/value columns, and the domain column for globalStorage).
> 
> Next, redefine the globalStorage and sessionStorage APIs to be just 
> "shorthands" to access the two SQL tables from the 'storage' database. 
> Define which SQL queries are automatically generated.

I think this would be an interesting idea, but I'm not sure it really 
gains us much.

Anyway, since I seem to be the only one really concerned that the bloat of 
having three storage mechanisms with slight differences, I'm going to just 
let it go.


> > > However, I was thinking: maybe UAs could allow the CREATE TEMPORARY 
> > > TABLE syntax (see the MySQL documentation [3]). The temporary tables 
> > > would be removed when the user terminates the session.
> > > 
> > > This should be good enough for session-related SQL needs.
> > 
> > That could work, though I'd feel better if we didn't have to futz in 
> > the SQL layer at this point.
> 
> That's not really futzing in the SQL layer. :)
> 
> I was just suggesting that the CREATE TEMPORARY TABLE syntax and 
> functionality to be encouraged for sessionStorage-like capabilities 
> within SQL. UAs can implement this, the spec doesn't need to define it - 
> there are other specs which most likely already do so.

Fair enough.


> Binary data, as in images, executables, videos, etc. can be inserted 
> into mySQL databases (and into other SQL databases, obviously). They 
> provide several field types which can handle hundreds of megabytes, even 
> gigabytes, of binary data. See BLOBs [1]
> 
> I was asking, if future UAs should allow inserting binary data into 
> tables.

Probably; it'll depend on what the File Upload spec ends up doing.


> 1. Why complicate things with database versions? Any use cases?

It is expected that applications may want to change their schemas. To know 
when they should do so, they need to be able to tell which version of the 
schema is currently active in an atomic fashion.


> As I see, one can have database versions completely different, 
> independent, of each other.

One database has a particular version that can be changed; you can't open 
a database except at its current version. What made you think otherwise?


> 2. Why openDatabase() instead of executeSql('SELECT databaseName') ?

Because of the versions, partly, but also because we want to allow the UA 
to initialise the database engine after knowing which database to load.


On Fri, 9 Nov 2007, Jim Jewett wrote:
> > 
> > As far as the question mark placeholders go, it's actually 
> > well-defined -- the spec says that the placeholders must be used in 
> > place of literals.
> 
> Having read this explanation, I now see that.  On first reading, I 
> assumed it was referring to (javascript string) literals that would be 
> substituted in.
> 
> Could you clarify:
> """
> 2.  Parse the first argument to the method (sqlStatement) as an SQL
> statement, with the exception that ? characters can be used in place
> of literals in the statement. [SQL]
> """

I've tried; let me know how it is. (Most of the change was actually adding 
a note a little lower down.)


> This still leaves open whether the statement has to actually be 
> PREPAREd, and whether other parameter types or host variables are 
> supported.

I'm not sure what you mean.


> It would also be good to have an example, so that people (like me) won't 
> wonder whether to use '?' instead of ?.

Yeah I'll add a whole intro section in due course.


On Tue, 11 Dec 2007, Anne van Kesteren wrote:
> 
> * The privacy section is no longer accurate as globalStorage has 
> changed.

Could you be more specific?


> * The security section should caution for malicious scripts trying to 
> fill up the user's storage space.

There's a separate "Disk Space" section.


> * Some concerns from globalStorage such as cookie resurrection also 
> apply to this feature. They should probably be referenced or explicitly 
> mentioned.

The two sections have the same security notes now.


On Mon, 18 Feb 2008 mikko.honkala@nokia.com wrote:
> 
> The current spec only notes: "Note: A future version of this
> specification will probably define the exact SQL subset required in more
> detail".
> http://tinyurl.com/23d324
> 
> This is not a good position, especially since implementations are
> starting to pop up. On a high level, we need requirements and further
> proposals for the subset that is going to get specified as a MUST level
> requirement in the spec.
> 
> We have at least three things as a starting point:
> 
> 1. Google Gears
> http://gears.google.com/
> 
> 2. WebKit Nightly Build DB Storage Implementation
> http://webkit.org/blog/126/webkit-does-html5-client-side-database-storag
> e/
> 
> 3. The SQL subset supported by SQLite
> http://www.sqlite.org/lang.html
> 
> All of these build on top of SQLite, which is in my opinion a valid 
> starting point, even on mobile devices. There are presumably differences 
> even with these three starting points.
> 
> One difference, which I am aware is whether the full-text search 
> extension is enabled or not (it is is Gears, but not to my knowledge in 
> WebKit nightly). I would very much argue that full-text search is an 
> essential requirement for HTML5 client-side database API, since it is 
> pretty heavy to implement the functionality in JavaScript. Comments?
> 
> So in minimum I propose having an issue in issue tracking system for 
> this, and would call for people who care about the client-side storage 
> to contribute to this topic.
> 
> My kick-off proposal would be to start with the number 3 : "The SQL
> subset supported by SQLite", combined with their support for full - text
> search:
> http://code.google.com/apis/gears/api_database.html#sqlite_fts
> http://www.sqlite.org/cvstrac/wiki?p=FtsTwo
> And making BEGIN/COMMIT/ROLLBACK  no-ops, since transactions are
> built-in to the API.

I agree with the above comments, in general. If someone would like to 
volunteer to write a separate spec to define this subset, that would be 
really useful. If anyone wants to do it, please, just go ahead and do it, 
there's no need to wait for approval or anything.


On Thu, 28 Feb 2008 mikko.honkala@nokia.com wrote:
> 
> maybe I am missing something but the database API does not seem to 
> define type conversion rules between JavaScript types and the database.
> 
> For query results, in 4.11.4 
> http://dev.w3.org/html5/spec/Overview.html#database
> the spec says: "Each property must have the name of the column and the
> value of the cell, as they were returned by the database.". It is
> unclear what type this value is. 
> 
> For query parameters, 4.11.3 
> http://dev.w3.org/html5/spec/Overview.html#executing
> says "Replace each ? placeholder with the value of the argument". Which
> datatypes can be used for the "value". How are they converted when
> inserted into the DB?

Ideally, I'd like this to be covered by the aforementioned Web SQL 
language spec.


On Mon, 24 Mar 2008, Cameron McCormack wrote:
> 
> I think the argument to executeSql() for the query parameters should be 
> of type sequence<any> rather than sequence<Object>.  Otherwise, 
> implementations would be required to do unnecessary conversions of 
> values such Numbers and Strings to their corresponding object types.

Fixed.


On Mon, 12 May 2008, Ben Meadowcroft wrote:
> 
> A further suggestion is allowing a form node to be passed as a parameter 
> to the executeSql function and binding the named parameters to the 
> values of form controls with the same name that are associated with the 
> passed in form node.

While that's an interesting idea, it seems somewhat esoteric. I'm not sure 
it'd be used enough to be worth it, given how complex testing such a 
feature would actually be.


On Fri, 3 Oct 2008, Victor Suba wrote:
>
> As I understand, client database transactions are to be queued and executed
> sequentially in the order that they are called.
> 
> This is a useful fact for ordering transaction code without endless chaining
> of callbacks, for instance in the following example the steps will execute
> in order 1,2,3,4:
> 
> db.transaction(
>  function(tx)
>  {
>   // step 1
>   tx.executeSql('...',[],
>    function(tx)
>    {
>     // step 2
>    });
>  });
> 
> db.transaction(
>  function(tx)
>  {
>   // step 3
>   tx.executeSql('...',[],
>    function(tx)
>    {
>     // step 4
>    });
>  });

There's no guarantee that step 3 won't execute before step 1 (or between 1 
and 2), it mostly depends on what the statements refer to and the state of 
the database. There's also no guarantee that the steps won't be rolled 
back, e.g. if whatever transaction finishes first clashes with whatever 
transaction finishes second.


> However, in the following case of nested transactions the execution is 
> perhaps not ideal, running steps in the order 1,2,5,6,3,4:
> 
> db.transaction(
>  function(tx)
>  {
>   // step 1
>   tx.executeSql('...',[],
>    function(tx)
>    {
>     // step 2
>    });
> 
>   db.transaction(
>   function(tx)
>   {
>    // step 3
>    tx.executeSql('...',[],
>     function(tx)
>     {
>      // step 4
>     });
>   });
>  });
> 
> db.transaction(
>  function(tx)
>  {
>   // step 5
>   tx.executeSql('...',[],
>    function(tx)
>    {
>     // step 6
>    });
>  });
> 
> For consideration, how about if transactions that are issued from inside 
> other transactions could be queued immediately after, to reliably 
> achieve order 1,2,3,4,5,6?

The nesting level doesn't matter, transactions are always run in parallel 
to each other. In the example above, the only order guarantees are:

   1 before 2
   1 before 3
   3 before 4
   5 before 6

But 6 could execute before 1, and 4 before 2.


On Fri, 17 Oct 2008, Philip Taylor wrote:
> Ian Hickson wrote:
> > On Fri, 21 Mar 2008, Sunava Dutta wrote:
> > > 
> > > Storage.remainingSpace
> > > 
> > > A straightforward and popular request, this API provides a script to 
> > > check the remaining persistent storage spec available to it, in 
> > > bytes. It's a very useful feature to allow pages to manage their 
> > > store better.
> > > 
> > > * <Open Issue> We currently return bytes but perhaps returning the 
> > > number of characters is more useful? We'd love to hear thoughts 
> > > here...
> > 
> > The problem with this feature is that there are a number of ways to 
> > store data, and thus no way to know exactly how much data can be 
> > stored.
> > 
> > For example, if the UA stores data in UTF-8 characters, the number of 
> > characters left to store will vary based on what characters are to be 
> > stored. Similarly, if the UA stores data in a compressed fashion, the 
> > number of bytes will vary based on how compressible the data is. [...] 
> > Thus this API really can't easily work in an interoperable fashion.
> 
> This seems like it could be a useful feature if it could be made to 
> work, so I'll try to propose the idea of a remainingSpacePercentage.

I guess that could work.

I considered adding it to Storage and Database, but at the end of the day 
the only use case is showing a chart and saying "you're almost out of 
space! Ask your browser for more!" and really it'd be better for the 
browser to know to ask for more when you're getting short on space.


> I can think of two main use cases:
> 
> (1) Indicating to the user how much space is available, like in Gmail's 
> "You are currently using 153 MB (2%) of your 7204 MB", so they know 
> whether they need to delete some of their old data.
> 
> There are four pieces of information that might be relevant: the 
> bytes(/characters/etc) used, the bytes(/etc) remaining, the total 
> bytes(/etc) available, and the percentage used. The most useful for 
> humans is the percentage - I have no idea how many bytes a typical email 
> is, so I wouldn't be helped much by "You have 618KB remaining", but if I 
> see I'm only using 38% of the space after a few months then I know I 
> don't need to worry yet.

Seems like the browser could just show that in the status bar or 
something.


> (2) Automatically cleaning up old/temporary data (e.g. caches) when 
> running out of space, to recover space for new data.
> 
> That cleanup could happen as late as possible (i.e. just as you're about 
> to store new data which doesn't fit), in which case the current setItem 
> out-of-space exception seems adequate - you can wrap setItem in a 
> function that tries to set, catches the exception, cleans up the cache 
> and then tries again.
> 
> Or it could happen at some earlier time, e.g. when the user is idle and 
> won't mind a bit of a pause while you clean up old data. That behaviour 
> could be very application-dependent: it's determined by how big the 
> caches are, how much data will be saved, how much space needs to be made 
> available, how often the cleanup process will run, etc. Or it could be 
> quite simple: if free space drops below 5%, expire old data until 
> there's none left or free space reaches 15%. I don't know what people 
> would want in practice, so I'll hope the latter is adequate.

I think that the exception is the more reliable way to do it.


> As before, but with a <bb type=managestoragequota>.

That's an interesting idea, we might want to look into doing this if the 
demand for such a feature justifies it. We'll have to see how that turns 
out once implementations are widely deployed.


> [snip other proposals]

I think your summary is very accurate. I think that not providing anything 
ends up being about as good an option as all the others, which makes it 
very attractive given its dramatically lower cost. So that's what I've 
gone with.

-- 
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'
Received on Wednesday, 26 November 2008 10:55:23 UTC