WebStorage feedback from Ian Hickson on 2009-04-29 (public-webapps@w3.org from April to June 2009)

From: Ian Hickson <ian@hixie.ch>
Date: Wed, 29 Apr 2009 08:30:04 +0000 (UTC)
To: João Eiras <joaoe@opera.com>, Anne van Kesteren <annevk@opera.com>, Olli Pettay <Olli.Pettay@helsinki.fi>, Jonas Sicking <jonas@sicking.cc>, Adam Barth <w3c@adambarth.com>
Cc: WebApps WG <public-webapps@w3.org>
Message-ID: <Pine.LNX.4.62.0904290739100.10370@hixie.dreamhostps.com>
On Tue, 7 Apr 2009, João Eiras wrote:
> 
> Please consider a typical webpage, that on first load, opens a database 
> (using openDatabase) and then creates a read-only transaction to read 
> data to initialize whatever needs initializing. If it's the first time 
> the user opens that webpage, the database that the webpage tries to 
> access effectively does not exist yet and the user agent will have to 
> create it, meaning that that database will be completely empty (no 
> objects nor data).
> 
> The processing steps for readTransaction are clear though, about calling 
> the apropriate callbacks for the queries, but it also demands, and 
> rightfully so, that a read-only transaction must not modify data. This 
> contradicts a bit the fact that the user agent would need to create the 
> new datafile to represent that database.

Such changes are implementation details; whether they are done as part of 
the openDatabase() call, as part of a transaction, or at some completely 
other time is not really relevant to the API.


> Although the specification does not make any direct reference to 
> datafiles, because that's an implementation detail, expecting an empty 
> database for a readTransaction would require the user agent either to 
> optimize this scenario to not execute any sql directly but to call the 
> error callbacks with the apropriate error codes, if the user agent knows 
> in advance that the database is empty, or, the user agent could create 
> an empty datafile which would represent our new empty database, but 
> that's a performance problem in devices with slow file IO. Even if the 
> user agent was not aware of the fact that the database is empty, it 
> would still cause all statements executed within the read-only 
> transaction on our empty database to fail.
>
> The situation is analogous to doing a first SELECT on a read-write 
> transaction. However, the select can be interleaved between other 
> DML/DDL statements which create and modify data, so there's no knowledge 
> before hand if a SELECT is going to be executed on read-write 
> transaction on an empty database. In this particular case, the user 
> agent would just report the normal error about querying nonexistent 
> objects.
> 
> So, I propose the readTransaction function to outright throw an 
> exception IF the database is completely empty (no objects, no data). 
> This would actually makes authors lives easier by getting the error 
> sooner, and make the situation more easily detectable. It would also 
> make implementors lives easier by introducing this optimization on the 
> specification.

Not all statements would fail, e.g. a "SELECT 1;" statement wouldn't fail. 
For this reason, I don't think it makes sense to fail early here.

Similarly, a transaction with no statements wouldn't fail.


On Fri, 24 Apr 2009, Anne van Kesteren wrote:
>
> We noticed a problem while working on Web Storage. The broadcasting 
> feature allows fork bombing. I personally couldn't think of any other 
> features that allows such a thing.
> 
> Works against Gecko:
> 
>   http://dump.testsuite.org/2009/storage/demo-001.htm
> 
> Works against WebKit:
> 
>   http://dump.testsuite.org/2009/storage/demo-001-webkit.htm
> 
> Internet Explorer seems protected against this particular demo increasing only
> a slightly in memory usage (though a lot of I/O):
> 
>   http://dump.testsuite.org/2009/storage/demo-001-trident.htm

How is this different from making two mutations per mutation event, or 
calling postMessage() twice for each invokation of the 'message' event, or 
loading two new iframes every time an iframe's 'load' event fires?


> Should we really be introducing such a new feature (broadcasting) as 
> part of sessionStorage/localStorage?

It seems important for scripts to be able to detect changes.


On Fri, 24 Apr 2009, Anne van Kesteren wrote:
>
> The storage event currently seems to be dispatched synchronously. 

Fixed.


On Fri, 24 Apr 2009, Anne van Kesteren wrote:
>
> There's quite a big interoperability problem with the events.
> 
>  * Per the specification the event is to be dispatched on Window (and does not
>    bubble).
>  * In Firefox it is dispatched on body and then bubbles up to Window.
>  * In WebKit it is dispatched on body (and does not bubble).
>  * In Internet Explorer it is dispatched on document.
> 
> It would be nice if we can figure out where we want to go in the end.

Seems like dispatching it on Window is the best plan in the end. It is 
unfortunate that the early implementations vary on this, but it's probably 
my fault (the implementations came along before I sorted out the 
body/Window event mess).

Note that firing on Window means that <body onstorage=""> works, and that 
capturing event listeners on Window work. So there are at least two ways 
to catch the event in all otherwise-compliant browsers (and if they're not 
otherwise compliant, then we have bigger problems anyway).


On Mon, 27 Apr 2009, Anne van Kesteren wrote:
>
> Consider having two windows A and B where the session history of A 
> consists of A1 and in B it consists of B1 and B2. A1 and B2 are both 
> fully active. A1, B1, and B2 are all same-origin. Now everytime the 
> Storage object of A1 and B2 are updated B1 will get more and more out of 
> sync because it does not receive the storage events. That seems bad.
> 
> One solution would be to queue storage events for B1 to be dispatched 
> when it becomes active again but if a lot of storage events are 
> dispatched it could take a while before the page is functional again 
> when you go back to it. In naive implementations of such a strategy the 
> message queue could also be exhausted.
> 
> Another solution would be to introduce an event for when a document 
> changes from inactive to active so the document can reinitialize itself.

On Mon, 27 Apr 2009, Olli Pettay wrote:
> 
> You mean something like 'pageshow' event?
> https://developer.mozilla.org/En/Using_Firefox_1.5_caching#pageshow_event

On Mon, 27 Apr 2009, Jonas Sicking wrote:
> 
> We do have an event like that in firefox called 'pageshow'. However I 
> don't think we can expect authors to remember not only to listen to 
> 'storage' events, but also remember to "flush" on 'pageshow' events. It 
> basically seems to me that a design where anyone listening to just 
> 'storage' but not 'pageshow' has a bug, is a bad design.
> 
> I can think of two other solutions:
> 
> Either we simply ask implementations to not cache windows with 'storage' 
> listeners. IIRC we do something similar with pages that have listeners 
> for 'beforeunload' listeners. Alternatively, implementations can queue 
> up 'storage' events until the implementation thinks that so many events 
> have been queued up that firing them all would lead to bad UI, at which 
> point the implementation can remove the window from the cache.

On Mon, 27 Apr 2009, João Eiras wrote:
> 
> That would be bad for perceived performance and usability.

On Mon, 27 Apr 2009, Jonas Sicking wrote:
> 
> Alternatively, we can make it such that the 'storage' event can be fired 
> with no indication of which value changed, in which case the page should 
> assume multiple changes have been made. Though this carries much of the 
> same risk as relying on a 'pageshow' event since developers are unlikely 
> to expect it.

I've gone with the "buffering" idea -- events are queued up until the 
document is active again (or the document is discarded, which can happen 
at any time at the UA's discretion).


On Tue, 28 Apr 2009, Anne van Kesteren wrote:
>
> The specification currently suggests to guard against subdomains. I was 
> wondering why subdomains are called out and not different ports or even 
> completely different domains now that postMessage() is available.
> 
> Since this particular section keeps talking about domains I was 
> wondering if it has actually been updated to reflect the switch from a 
> domain-based policy to a origin-based policy for storage. It seems that 
> some of the recommendations need to be reworded.

On Tue, 28 Apr 2009, Adam Barth wrote:
>
> Yeah, this requirement doesn't make very much sense:
> 
> "User agents should guard against sites storing data in the storage 
> areas or databases of subdomains, e.g. storing up to the limit in 
> a1.example.com, a2.example.com, a3.example.com, etc, circumventing the 
> main example.com storage limit."
> 
> Someone who wants to use up a lot of storage can just register as many 
> domain names as he/she likes for $5 a piece.
> 
> I suggest removing the requirement.

I've changed it a bit, because it seems UAs are likely to still want a 
per-origin limit. But I'm not really sure what to suggest that's more 
concrete that the vague handwaving that is there now.

-- 
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'
Received on Wednesday, 29 April 2009 08:31:24 UTC