Re: Detailed review of 4.11. Client-side persistent storage from Mihai Sucan on 2008-01-11 (public-html@w3.org from January 2008)

From: Mihai Sucan <mihai.sucan@gmail.com>
Date: Fri, 11 Jan 2008 21:23:21 +0200
To: "Ian Hickson" <ian@hixie.ch>
Cc: public-html <public-html@w3.org>
Message-ID: <op.t4r5g7zemcpsjgr0b0dp@athlon>
Hello!

Thanks for the reply.


Le Tue, 11 Dec 2007 04:03:43 +0200, Ian Hickson <ian@hixie.ch> a écrit:

> (This e-mail also has some replies to e-mails on the DB API.)
>
> On Mon, 17 Sep 2007, Mihai Sucan wrote:
>>
>> 1. In section 4.11.3. "The StorageItem interface" [2], I would suggest:
>
> This section is now gone.
>
>
>> a) the StorageItem objects should also have two read-only attributes:
>> dateCreated and dateModified, as Date objects (or UNIX timestamps).
>>
>> One of the uses is some web applications might disconsider/purge values
>> which are too old. Currently, one would need to store two separate items
>> for this kind of tracking. I'm not asking for an dateExpires, like
>> cookies have.
>>
>> Also, I'm thinking UAs which will implement persistent storage will
>> obviously internally save the dateCreate and dateModified values -
>> they'll use these two to automatically purge items which are too old
>> (such that the UA doesn't slow down too much, performance issues, and
>> privacy issues). Basically, I only want these two values exposed to the
>> web applications as well.
>>
>> It doesn't really make sense to leave this out of the spec. There are
>> tons of cases where timestamps are used: files and folders on
>> filesystems have the created and/or modified date as metadata,
>> databases, tables in relational databases (like mySQL) have created and
>> modified date as metadata, emails, etc.
>
> I think if people want to have this information, they should use the SQL
> API. Do you think this is acceptable?

The spec changed quite much since I sent the email. Mainly, the  
StorageItem interface is gone, thus the wish for adding the dateModified  
and dateCreated properties is no longer applicable.

In the current form of the spec, this is acceptable. Nonetheless, it seems  
to me "overkill" to be required to use the SQL API for such trivial needs.

How about adding the getInfo(key) method on the Storage object? This could  
provide details about the key, such as dateModified, dateCreated, lastUrl  
(the exact URL of the last page which updated the item).


>> b) the StorageItem object could also have an attribute defining lastURL:
>> the absolute URL of the last page (without any query parameters) which
>> modified the value of the object.
>>
>> This is just an idea - I don't consider this a requirement (as the above
>> one). It would be a nice feature.
>>
>> But then ... both of the suggestions above enable even more tracking -
>> privacy concerns. Maybe enable these attributes only for secure pages?
>
> Since we've removed shared access, and StorageItems in general, this is
> rather moot at this point. :-)

True, but not quite. :)

(see my answer above, regarding "acceptability")

>> c) Also a question: the storage event is defined just as a notification
>> which tells the potential listeners that the storage for the domain has
>> been modified. Why wasn't the storage event defined as a notification
>> which tells exactly what changed? As in, include the StorageItem object
>> itself as well. Would that be a security/privacy concern? It shouldn't
>> be: the scripts can access the StorageItem, anyway.
>
> The concern is that there may be a lot of changes (especially with  
> session
> storage). I'd be interested in hearing from authors who wish to use this
> API, though. What do you envisage doing with it that may require detailed
> notifications of changes across windows?

Well, as a Web developer having the storage event notifying me there were  
some changes is quite useless.

Imagine this scenario: why would anyone choose to have an event listener  
for 'storage'? What's the use of knowing that one or more of, perhaps,  
hundreds of storage items changed, without knowing which? I envision using  
the listener *only* for when I need to re-read a specific key (or more  
than one).


>> Currently, say two web applications would need to share *several*
>> StorageItem objects. If application A changes something of interest for
>> application B, then the listener within page B would have to search
>> through the list of StorageItem objects of the domain where application
>> A resides. Only the domain is known, given the "domain" attribute
>> defined within the storage event. Also, checking what was changed is
>> even harder given there's no dateModified attribute defined for
>> StorageItem objects. If performance is an issue for both applications,
>> they would have to use cross-site messaging to notify each other about
>> the specific changes. That shouldn't be needed for simple storage
>> updates - only for complex communication between two (or more)
>> applications. Cross-site messaging would also add a lot more complexity,
>> because the involved application must have their "communication
>> protocol" defined.
>
> I agree that it's suboptimal, but I don't want to make the API complex
> unless it's truly needed.

There's a hard balance to be struck between what we believe it's needed  
and what's really needed. I don't know what I can add to 'my case'.

>> How are scripts supposed to work when the "disk quota is full"? That
>> should be defined in the spec.
>
> How do you mean?

I mean, a script should be able to check if the disk quota is full or not,  
such that the author of the document can inform the user.


>> An idea would be to have a new boolean attribute for the Storage object:
>> isWritable. This would false when "disk quota is full", or true
>> otherwise.
>
> Typically the disk quota is never actually exactly full. e.g. the usage
> could be at 995 bytes, the quota at 1000 bytes, so adding an ASCII string
> of 4 characters could work but adding a chinese string of 4 characters
> could fail (e.g. if the system used UTF-8).

This is a matter of details.

The UA should provide the means of checking 'free space' in bytes of 8  
bits. It's up to the author to know how many characters he can write - he  
knows the character set and encoding.


>> 5. In section 4.11.8.4. "Cross-protocol and cross-port attacks" [7]:
>>
>> "Big Issue: What about if someone is able to get a server up on a port,
>> and can then send people to that URI? They could steal all the data with
>> no further interaction. How about putting the port number at the end of
>> the string being compared? (Implicitly.)"
>>
>> I strongly recommend putting the port number at the end of the string
>> being compared. My recommandation is not based only on security-related
>> concerns, but also practical concerns.
>>
>> It's very wrong to assume the same application runs on a different port,
>> on the same domain. It's obviously a different web application.
>>
>> Web developers (including me) commonly host multiple web
>> sites/applications on the same server, on varying port numbers. It would
>> be very confusing and annoying to have the same persistent storage
>> across different ports.
>>
>> The current definition of the persistent storage is completely
>> eliminating the use of port numbers - which is very wrong.
>
> This is now moot with the use of same-origin restrictions only.

Good.


>> 6. Personally I find the overall storage idea very good. However, I also
>> find it far too "liberal" - regarding security.
>>
>> Here's what I suggest, something maybe simple, yet, this is something I
>> would personally use, in many cases:
>>
>> Define a third argument for the setItem() method of the Storage object.
>> Name it "private", of boolean type. If the author sets this optional
>> argument to true, then the StorageItem object is flagged as private.
>>
>> StorageItem objects flagged as private will *only* be available to
>> scripts on the *same* domain (same origin), not on any subdomains, not
>> on higher-level domains. For example: if a script on "music.example.com"
>> creates a private StorageItem object named "myTest", other scripts on
>> the same domain will be able to read it. Yet, scripts which run on
>> "beta.music.example.com" or "example.com", will cause raising a security
>> exception if they try to read/write the "myTest" StorageItem object.
>
> I've made this "private" mode the only mode.

Good.

> On Fri, 12 Oct 2007, Mihai Sucan wrote:
>
>> Binary data, as in images, executables, videos, etc. can be inserted
>> into mySQL databases (and into other SQL databases, obviously). They
>> provide several field types which can handle hundreds of megabytes, even
>> gigabytes, of binary data. See BLOBs [1]
>>
>> I was asking, if future UAs should allow inserting binary data into
>> tables.
>>
>> 1. I want to build a Web application which allows the user to do word
>> processing offline.
>>
>> 2. I, as a Web developer, find it ideal to store all the documents
>> within the SQL storage.
>>
>> 3. For the moment I have some concerns:
>>
>> a) How can I allow the user to "upload" files (images, videos, sounds,
>> archives, etc) into his/her documents without actually uploading the
>> files to my server? *Offline* Web application.
>
> There are plans afoot to add APIs to HTMLInputElement type=file to handle
> this. They are currently stalled on waiting for the forms task force to
> complete so that we don't go down a path that the W3C later decides is  
> the
> wrong path.
>
>
>> b) How do I insert all this binary data into the SQL storage? Can I have
>> something like:
>> executeSql('INSERT INTO `myfiles` (`name`, `data`) VALUES (?,?)',
>> my_file_name, my_file_data)
>
> I don't believe we currently have a good solution for such binary data,
> but this will likely change in time (it depends a bit on ES4).
>
>
>> c) There are other concerns as well: can JavaScript engines handle
>> variables that have several megabytes of such data? Isn't that too
>> memory-intensive? Or... could we have special FileObjects which don't
>> actually have the files loaded into memory, but which can be passed to
>> SQL queries?
>>
>> d) Once the user goes online, how can I upload the files to the server?
>> Without actually reading the entire file from the SQL storage into
>> memory. Again, probably such fields should be some JS SqlBlobObjects
>> which don't actually contain the entire blob, but they can be passed to
>> input type=file for uploading to remote servers.
>>
>> e) How about streaming the binary data to the remote server with the
>> network connection API?
>
> We'll probably have to address these when we work on forms.

All of the above should be addressed by the time html5 will reach "last  
call for comments", with proper solutions (without deferring "for the  
future"). I believe support for manipulating huge amounts of binary data  
is highly important, given the competition (Flash, Silverlight, etc) is  
already ahead.



Good luck,

-- 
Mihai Sucan
http://www.robodesign.ro
Received on Friday, 11 January 2008 19:23:33 UTC