Re: IE Team's Feedback on HTML 5.0 DOM Store from Jeff Walden on 2008-04-04 (public-html-comments@w3.org from April 2008)

From: Jeff Walden <jwalden@MIT.EDU>
Date: Fri, 04 Apr 2008 04:41:46 -0400
To: Sunava Dutta <sunavad@windows.microsoft.com>
CC: "public-html-comments@w3.org" <public-html-comments@w3.org>, Zhenbin Xu <zhenbinx@windows.microsoft.com>, Chris Wilson <Chris.Wilson@microsoft.com>, Marc Silbey <marcsil@windows.microsoft.com>, Doug Stamper <dstamper@exchange.microsoft.com>, Alex Kuang <Alex.Kuang@microsoft.com>, Eric Lawrence <ericlaw@exchange.microsoft.com>, Cyra Richardson <Cyra.Richardson@microsoft.com>, Sharath Udupa <Sharath.Udupa@microsoft.com>
Message-ID: <47F5E9CA.1040806@mit.edu>

It's been mentioned that I might need a disclaimer that this "isn't a WG response", but then again I'm not a WG member, nor do I have the time to be one, so that might be moot.  Whatever, you get the picture.

Sunava Dutta wrote:
>     Storage Checker
>     <http://msdn2.microsoft.com/en-us/library/cc197016(VS.85).aspx>
> 
>     /Storage.remainingSpace/
> 
> ·         <Open Issue> We currently return bytes but perhaps returning 
> the number of characters is more useful? We’d love to hear thoughts here…

This is just sort of yuck no matter how you split it, since ideally an implementation should be able to use UTF-8, UTF-16, UTF-32, etc. or whatever it wants as a backend storage mechanism.  Neither bytes nor characters are really satisfactory here, the former due to variable-length encodings and the latter due to the braindead UCS-2 of JavaScript.  I think there are two options.

The first option is for this to be the number of Unicode code points (either non-BMP or BMP code points) which can be stored.  This penalizes the amount of non-BMP content that can be stored, but memory and storage grow quickly and are usually large enough that in the long term I think the number of remaining code points that can be stored is the most usable choice.  Further, the implementation must not be allowed to optimize to allow greater storage use if, say, content is primarily ASCII rather than non-BMP (so you don't have to constantly check remainingSpace as you add more entries, just in case remainingSpace doesn't change in the obvious way due to implementation choices).  Note that the no-need-to-constantly-check requirement imposes even more problems in that you have to be able to absorb per-pair overhead invisibly (N size-2 pairs shouldn't be different from N/2 size-4 pairs), unless you expose the per-pair overhead somehow -- but in many implementations that might n
ot even be constant!

The second option I see, based on the complications involved in precisely specifying remainingSpace's value, is to not include it.  Specifying and implementing a sanely-behaving remainingSpace whose behavior (aside from an initial value) can be exactly specified as pairs are added and removed sounds very hard to me, and if there's anything we know from the web, it's that any little cross-implementation difference in behavior will matter at some time and to someone.  I think I would tend to lean slightly toward not exposing the value on the basis that it's hard to describe it sanely, but I'm not sure I'd really argue against adding it, assuming its behavior were fully specified with respect to an implementation-defined starting value.

>     Clear All API
>     <http://msdn2.microsoft.com/en-us/library/cc288131(VS.85).aspx>
> 
>     /Storage.clear()/
> 
> An obvious benefit for the persistent store, unlike removeItem 
> <http://msdn2.microsoft.com/en-us/library/cc197047(VS.85).aspx>, this 
> API removes all key/value pairs accessible to that script without 
> requiring costly iteration over all data entries. 

I wouldn't really /mind/ if this were present, but is there a great call for this?  I would think a site wouldn't want to clear all its data all that often, in which case requiring a for-in loop isn't that huge a cost on the client developer and in the grand scheme doesn't impose a huge performance penalty.  Keeping the method, on the other hand, means more burden on the implementation to correctly implement and test the method and to check for security concerns, among other costs.  I'd punt what seems like a rare operation to the client developer to implement with for-in loops.

Jeff

Received on Friday, 4 April 2008 09:40:00 UTC