RE: DOM Storage feedback from Zhenbin Xu on 2008-06-19 (public-html-comments@w3.org from June 2008)

From: Zhenbin Xu <Zhenbin.Xu@microsoft.com>
Date: Wed, 18 Jun 2008 19:01:33 -0700
To: "Ian Hickson " <IMCEAMAILTO-ian+40hixie+2Ech@windows.microsoft.com>
CC: "public-html-comments@w3.org" <public-html-comments@w3.org>, Sunava Dutta <sunavad@windows.microsoft.com>, IE8 Core AJAX SWAT Team <ieajax@microsoft.com>
Message-ID: <72F767ADE7C63540BE69CD2722A41F440E9982104E@NA-EXMSG-W601.wingroup.windeploy.ntd>
[Don't know what is the alias for WHATWG Mailing List, please add when reply].

Some comment below...

>
> -----Original Message-----
> From: Ian Hickson [mailto:ian@hixie.ch]
> Sent: Sunday, April 27, 2008 5:32 PM
> To: public-html-comments@w3.org; WHATWG Mailing List
> Subject: DOM Storage feedback
>
> On Tue, 19 Feb 2008, Ralf Stoltze wrote:
> >
> > I found three more occurrences of "global storage area" in the
> current
> > version, in 4.10.6.2 and 4.10.7.1.
>
> Fixed.
>
>
> > And a minor typo in 4.10.5: "hte" --> "the"
>
> Thanks. Fixed.
>
>
> On Thu, 13 Mar 2008, Dave Camp wrote:
> >
> > 4.10.6.1 still talks about quotas being per-domain-setting-the-value
> vs.
> > per-storage-area, this is no longer relevant.
>
> Fixed.
>
>
> On Fri, 21 Mar 2008, Sunava Dutta wrote:
> >
> > Meanwhile, we have feedback for the draft that we feel and hope will
> > contribute to the existing repository of web developer's tools. Here
> are
> > a few enhancements that we've implemented. We think these features
> are
> > good for developers and want to add them to the spec. MSDN is being
> > updated so for more details please refer to the
> > API.<http://msdn2.microsoft.com/en-us/library/cc197062(VS.85).aspx>
> >
> > Storage.remainingSpace
> >
> > A straightforward and popular request, this API provides a script to
> > check the remaining persistent storage spec available to it, in
> bytes.
> > It's a very useful feature to allow pages to manage their store
> better.
> >
> > * <Open Issue> We currently return bytes but perhaps returning the
> > number of characters is more useful? We'd love to hear thoughts
> here...
>
> The problem with this feature is that there are a number of ways to
> store
> data, and thus no way to know exactly how much data can be stored.
>
> For example, if the UA stores data in UTF-8 characters, the number of
> characters left to store will vary based on what characters are to be
> stored. Similarly, if the UA stores data in a compressed fashion, the
> number of bytes will vary based on how compressible the data is.
> Furthermore, we don't want to preclude user agents from dynamically
> increasing the amount of available storage based on user actions, for
> example the UA could automatically increase the storage every time the
> user interacts with the page, or could prompt the user to increase the
> storage when it gets to 80%.
>
> Thus this API really can't easily work in an interoperable fashion.
>
>
> > Clear All API<http://msdn2.microsoft.com/en-
> us/library/cc288131(VS.85).aspx>
> > Storage.clear()
> >
> > An obvious benefit for the persistent store, unlike
> > removeItem<http://msdn2.microsoft.com/en-
> us/library/cc197047(VS.85).aspx>,
> > this API removes all key/value pairs accessible to that script
> without
> > requiring costly iteration over all data entries.
>
> Added.
>
>
> > Asynchronous model for DOM Storage:<http://msdn2.microsoft.com/en-
> us/library/cc288674(VS.85).aspx>
> > The spec calls for atomic setItem() and removeItem() with respect to
> > changes to the data storage area. This is valuable since it ensures
> that
> > data is written to disk successfully. One of the major differences
> today
> > between what we've implemented is that we went down an "asynchronous"
> > path.  This is because there are significant performance advantages
> if
> > committing to the disk can be delayed while providing the data
> instantly
> > from memory.  Our model is exposed the same as a synchronous model to
> > the web developer so there should be no differences to the developer
> who
> > is interested in programming it using no new APIs other than that
> > specified in the HTML 5.0 spec. Here are some of the reasons why the
> > asynchronous model was chosen:
> >
> > 1. It offers much better performance without sacrificing the
> persistence
> > to disk -- we will fire an event to confirm commit so web
> applications
> > can listen to the event when persistence is a concern.
> >
> > 2.  Our customer outreach leads us to believe that the DOM Storage is
> > primarily used as local cache to improve responsiveness. The
> performance
> > cost of synchronous storage in our opinion outweighs the need for
> > guaranteed persistence.
> >
> > 3.  It avoids a possible hang in the UI thread if the operation takes
> > long time and therefore makes the browsing experience more fluid.
>
> The API is designed in such a way that UAs can implement a lazy commit
> on
> the back end, but this should not be exposed to the author -- there is
> no
> reason for the author to need to know whether the data is in RAM, flash
> storage, disk storage, remote tape storage, or whatever.
>


[Zhenbin Xu]  There is clearly an advantage to expose this concept to authors
-- just like they need to know network can be slow so better use async XHR
rather than sync XHR -- storage operation can be very slow. Sync DOM Storage
operation is a potential hang point,

>From our testing, async model is hundreds of times faster than sync model
where a database transaction is utilized.

The async model we propose doesn't require changes to how authors use DOM Storage.
The APIs are the same, once an item is set, it can be used immediately. only that
we don't guarantee that the data is committed to disk.  We will fire an event
asynchronously once the data is really written to disk. Most web applications
won't care since DOM Storage is used for caching purpose and if data is not written to
disk, it simply have one less cached item next time it runs [think email cache
scenario -- it would just fetch one more email from server if it is not in local cache]

An analogy here is file system cache -- when we call fileHandler.close(),
the file may still be in OS cache, and will be written to disk sometime later.






> > Hence we created an async model. As far as the web developer is
> > concerned, the data will be available immediately from memory when
> the
> > onstorage event it fired so the behavior is the similar and no new
> APIs
> > are needed, hence its backwards compatible with other
> implementations.
> > In addition, an onstoragecommit event is fired for developers who
> want
> > to ensure the data is persisted to disk successfully.
> >
> > Begin/Commit<http://msdn2.microsoft.com/en-
> us/library/cc197036(VS.85).aspx>
> >
> > Storage.begin()
> >
> > Storage.commit()
> >
> > It's an application level concept that web authors who want a set of
> > values to be committed at once can use. Data is either committed in
> its
> > entirety (batch commit) or not. This is especially critical given
> that
> > DOM Storage is primarily used for name/value pair operations and many
> > applications need a set of name/value pairs to define a consistent
> > state.
>
> In particular, these methods and the associated events should not be
> provided to authors. The SQL database API is already capable of
> handlign
> this. Authors who wish to have transactions are almost certainly going
> to
> want other features too, such as types, database schemas, etc.
>
>
> If you insist on keeping these features, please do mark them as
> Microsoft-proprietary by prefixing them with "ms" or some such, as in,
> "msBegin()", "msCommit()", etc, so that authors can clearly see that
> these
> APIs are non-standard, and so that we don't have any conflicts with
> future
> extensions to the API.
>
>


[Zhenbin Xu]  I have to disagree that customers who wants begin()/commit()
should use SQL API.  If the argument holds, why not simply get rid of
DOM Storage feature completely since customer can just use SQL API?

This request come from one of our top customers when we discuss about adoption
of DOM Storage feature. They really want to make sure their state is consistent
without going through a lot of extra efforts.  For instance, to cache an email,
you would want to cache complete headers (e.g. From, To, Subject, Sent) and body
instead of only part of them.  Without begin/commit they would have to either
concat them or make sure the storage writing statements are grouped together
and non-interruptible, a limitation could have been avoid with begin/commit.





> On Fri, 4 Apr 2008, Jeff Walden wrote:
> > >
> > >     /Storage.remainingSpace/
> >
> > This is just sort of yuck no matter how you split it, since ideally
> an
> > implementation should be able to use UTF-8, UTF-16, UTF-32, etc. or
> > whatever it wants as a backend storage mechanism.  Neither bytes nor
> > characters are really satisfactory here, the former due to
> > variable-length encodings and the latter due to the braindead UCS-2
> of
> > JavaScript.  I think there are two options.
> >
> > The first option is for this to be the number of Unicode code points
> > (either non-BMP or BMP code points) which can be stored.  This
> penalizes
> > the amount of non-BMP content that can be stored, but memory and
> storage
> > grow quickly and are usually large enough that in the long term I
> think
> > the number of remaining code points that can be stored is the most
> > usable choice.  Further, the implementation must not be allowed to
> > optimize to allow greater storage use if, say, content is primarily
> > ASCII rather than non-BMP (so you don't have to constantly check
> > remainingSpace as you add more entries, just in case remainingSpace
> > doesn't change in the obvious way due to implementation choices).
> Note
> > that the no-need-to-constantly-check requirement imposes even more
> > problems in that you have to be able to absorb per-pair overhead
> > invisibly (N size-2 pairs shouldn't be different from N/2 size-4
> pairs),
> > unless you expose the per-pair overhead somehow -- but in many
> > implementations that might not even be constant!
> >
> > The second option I see, based on the complications involved in
> > precisely specifying remainingSpace's value, is to not include it.
> > Specifying and implementing a sanely-behaving remainingSpace whose
> > behavior (aside from an initial value) can be exactly specified as
> pairs
> > are added and removed sounds very hard to me, and if there's anything
> we
> > know from the web, it's that any little cross-implementation
> difference
> > in behavior will matter at some time and to someone.  I think I would
> > tend to lean slightly toward not exposing the value on the basis that
> > it's hard to describe it sanely, but I'm not sure I'd really argue
> > against adding it, assuming its behavior were fully specified with
> > respect to an implementation-defined starting value.
>
> Since I can't see any clear way to define it, I have omitted it.
>
>
> > >     Clear All API
> > >     <http://msdn2.microsoft.com/en-us/library/cc288131(VS.85).aspx>
> > >
> > >     /Storage.clear()/
> > >
> > > An obvious benefit for the persistent store, unlike removeItem
> > > <http://msdn2.microsoft.com/en-us/library/cc197047(VS.85).aspx>,
> this API
> > > removes all key/value pairs accessible to that script without
> requiring
> > > costly iteration over all data entries.
> >
> > I wouldn't really /mind/ if this were present, but is there a great
> call
> > for this?  I would think a site wouldn't want to clear all its data
> all
> > that often, in which case requiring a for-in loop isn't that huge a
> cost
> > on the client developer and in the grand scheme doesn't impose a huge
> > performance penalty.  Keeping the method, on the other hand, means
> more
> > burden on the implementation to correctly implement and test the
> method
> > and to check for security concerns, among other costs.  I'd punt what
> > seems like a rare operation to the client developer to implement with
> > for-in loops.
>
> I think having the clear() method is useful from a debugging point of
> view, and from an atomicity point of view.
>
>
> On Fri, 4 Apr 2008, Brady Eidson wrote:
> >
> > From section 4.10.2:
> > -----
> > When the setItem() method is invoked, events are fired on other
> HTMLDocument
> > objects that can access the newly stored data, as defined in the
> sections on
> > the sessionStorage and localStorage attributes.
> >
> > The removeItem(key) method must cause the key/value pair with the
> given key to
> > be removed from the list associated with the object, if it exists. If
> no item
> > with that key exists, the method must do nothing.
> > -----
> > The SessionStorage and LocalStorage sections go on to mention only
> firing
> > events resulting from setItem().  It seems kind of weird that we fire
> > StorageEvents for one form of mutation of a Storage object, but not
> for the
> > other form.
> >
> > Shouldn't removeItem() cause a StorageEvent to fire, as well?  If
> this
> > was a purposeful omission, I'm curious as to why!  :)
>
> Fixed.
>
>
> On Tue, 8 Apr 2008, Brady Eidson wrote:
> >
> > Section 4.10 has 2 details I'd like clarification on, that I think
> will
> > probably result in changes to the spec.
> >
> > 1 - The entire section describes the StorageEvent interface, and
> > specifies where StorageEvents should be fired when setItem() is
> called
> > on any Storage object.  I asked the mailing list last week and am now
> > reiterating the question: Shouldn't this event fire for removeItem()
> as
> > well? Hixie seemed to agree in IRC that it should, but I'd love to
> "see
> > it in writing"  ;)
>
> See above.
>
>
> > 2 - There is no mention of an "onstorage" attribute one can set on
> their
> > <body> element.  With this omission, it seems that the only way to
> > actually listen for a storage event is to perform an addEventListener
> on
> > the body.  The technical docs for the IE8 beta indicate they support
> > onstorage, and I tend to think it should be an official part of the
> > spec.
>
> Added.
>
>
> On Wed, 9 Apr 2008, Brady Eidson wrote:
> >
> > From section 4.10.5:
> >
> > "The event must have its key attribute set to the name of the key in
> > question, its oldValue attribute set to the old value of the key in
> > question, or null if the key is newly added, its newValue attribute
> set
> > to the new value of the key in question, or null if the key was
> removed,
> > its uri attribute set to the address of the page whose Storage object
> > was affected, and its source attribute set to the Window object of
> hte
> > browsing content that that documents finds is in."
> >
> > Everything until the last clause makes sense.  "... and its source
> > attribute set to the Window object of hte browsing content that the
> > documents finds is in."
> >
> > Besides the obvious spelling mistake, I cannot parse the rest of that
> > sentence!  :)
>
> Fixed.
>
>
> On Thu, 10 Apr 2008, Brady Eidson wrote:
> >
> > In 4.10.5, the description of the properties on the StorageEvent
> object
> > mentions "...its newValue attribute set to the new value of the key
> in
> > question, or null if the key was removed..."
> >
> > So a web author can assume that when handling a StorageEvent whose
> > newValue property is null that the event was the result of a
> > removeItem(), and the key is no longer in the list.
> >
> > However in 4.10.2 in the discussion of setItem(), there is no mention
> > that null is not an allowed value.  Something like
> > sessionStorage.setItem("key", null) is not forbidden, therefore it is
> > allowed, and it would result in a StorageEvent with a null newValue.
> >
> > To resolve this case, I propose that we specify that the value in a
> key/value
> > pair cannot be set to null.
> > I see two clean ways to specify this:
> >
> > 1 - Throw an exception when setItem() is called with a null value.
> > 2 - Specify setItem(key, null) to have the exact same effects as
> > removeItem(key).
> >
> > I prefer #2.  Thoughts?
>
> On Fri, 11 Apr 2008, Anne van Kesteren wrote:
> >
> > Euhm, setItem() takes two strings. Therefore I'd expect null,
> undefined,
> > etc. to be stringified.
>
> On Thu, 10 Apr 2008, Brady Eidson wrote:
> >
> > Ugh... yup.  You're right.
> >
> > Nevermind!
>
> I'm not sure I understand why this is not an isue, but ok.
>
> --
> Ian Hickson               U+1047E                )\._.,--....,'``.
> fL
> http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._
> ,.
> Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-
> .;.'
Received on Thursday, 19 June 2008 02:02:05 UTC