W3C home > Mailing lists > Public > whatwg@whatwg.org > September 2009

[whatwg] Structured clone algorithm on LocalStorage

From: Jeremy Orlow <jorlow@chromium.org>
Date: Thu, 24 Sep 2009 01:45:03 -0700
Message-ID: <5dd9e5c50909240145p132064bq8b092ad7e6641ce@mail.gmail.com>
On Thu, Sep 24, 2009 at 12:20 AM, Jonas Sicking <jonas at sicking.cc> wrote:

> On Wed, Sep 23, 2009 at 10:19 PM, Darin Fisher <darin at chromium.org> wrote:
> >
> >
> > On Wed, Sep 23, 2009 at 8:10 PM, Jonas Sicking <jonas at sicking.cc> wrote:
> >>
> >> On Wed, Sep 23, 2009 at 3:29 PM, Jeremy Orlow <jorlow at chromium.org>
> wrote:
> >> > On Wed, Sep 23, 2009 at 3:15 PM, Jonas Sicking <jonas at sicking.cc>
> wrote:
> >> >>
> >> >> On Wed, Sep 23, 2009 at 2:53 PM, Brett Cannon <brett at python.org>
> wrote:
> >> >> > On Wed, Sep 23, 2009 at 13:35, Jeremy Orlow <jorlow at chromium.org>
> >> >> > wrote:
> >> >> >> What are the use cases for wanting to store data beyond strings
> (and
> >> >> >> what
> >> >> >> can be serialized into strings) in LocalStorage?  I can't think of
> >> >> >> any
> >> >> >> that
> >> >> >> outweigh the negatives:
> >> >> >> 1)  From previous threads, I think it's fair to say that we can
> all
> >> >> >> agreed
> >> >> >> that LocalStorage is a regrettable API (mainly due to its
> >> >> >> synchronous
> >> >> >> nature).  If so, it seems that making it more powerful and thus
> more
> >> >> >> attractive to developers is just asking for trouble.  After all,
> the
> >> >> >> more
> >> >> >> people use it, the more lock contention there'll be, and the more
> >> >> >> browser UI
> >> >> >> jank users will be sure to experience.  This will also be worse
> >> >> >> because
> >> >> >> it'll be easier for developers to store large objects in
> >> >> >> LoaclStorage.
> >> >> >> 2)  As far as I can tell, there's no where else in the spec where
> >> >> >> you
> >> >> >> have
> >> >> >> to serialize structured clone(able) data to disk.  Given that
> >> >> >> LocalStorage
> >> >> >> is supposed to throw an exception if any ImageData is contained
> and
> >> >> >> since
> >> >> >> File and FileData objects are legal, it seems as though making
> >> >> >> LocalStorage
> >> >> >> handle structured clone data has a fairly high cost to
> implementors.
> >> >> >>  Not to
> >> >> >> mention that disallowing ImageData in only this one case is not
> >> >> >> intuitive.
> >> >> >> I think allowing structured clone(able) data in LocalStorage is a
> >> >> >> big
> >> >> >> mistake.  Enough so that, if SessionStorage and LocalStorage can't
> >> >> >> diverge
> >> >> >> on this issue, it'd be worth taking the power away from
> >> >> >> SessionStorage.
> >> >> >> J
> >> >> >
> >> >> > Speaking from experience, I have been using localStorage in my PhD
> >> >> > thesis work w/o any real need for structured clones (I would have
> >> >> > used
> >> >> > Web Database but it isn't widely used yet and I was not sure if it
> >> >> > was
> >> >> > going to make the cut in the end). All it took to come close to
> >> >> > simulating structured clones now was to develop my own
> compatibility
> >> >> > wrapper for localStorage (http://realstorage.googlecode.com for
> those
> >> >> > who care) and add setJSONObject() and getJSONObject() methods on
> the
> >> >> > wrapper. Works w/o issue.
> >> >>
> >> >> Actually, this seems like a prime reason *to* add structured storage
> >> >> support. Obviously string data wasn't enough for you so you had to
> >> >> write extra code in order to work around that. If structured clones
> >> >> had been natively supported you both would have had to write less
> >> >> code, and the resulting algorithms would have been faster. Faster
> >> >> since the browser can serialize/parser to/from a binary internal
> >> >> format faster than to/from JSON through the JSON serializer/parser.
> >> >
> >> > Yes, but since LocalStorage is already widely deployed, authors are
> >> > stuck
> >> > with the the structured clone-less version of LocalStorage for a very
> >> > long
> >> > time.  So the only way an app can store anything that can't be
> JSONified
> >> > is
> >> > to break backwards compatibility.
> >> >
> >> >
> >> >
> >> > On Wed, Sep 23, 2009 at 3:11 PM, Jonas Sicking <jonas at sicking.cc
> > wrote:
> >> >>
> >> >> On Wed, Sep 23, 2009 at 1:35 PM, Jeremy Orlow <jorlow at chromium.org>
> >> >> wrote:
> >> >> > What are the use cases for wanting to store data beyond strings
> (and
> >> >> > what
> >> >> > can be serialized into strings) in LocalStorage?  I can't think of
> >> >> > any
> >> >> > that
> >> >> > outweigh the negatives:
> >> >> > 1)  From previous threads, I think it's fair to say that we can all
> >> >> > agreed
> >> >> > that LocalStorage is a regrettable API (mainly due to its
> synchronous
> >> >> > nature).  If so, it seems that making it more powerful and thus
> more
> >> >> > attractive to developers is just asking for trouble.  After all,
> the
> >> >> > more
> >> >> > people use it, the more lock contention there'll be, and the more
> >> >> > browser UI
> >> >> > jank users will be sure to experience.  This will also be worse
> >> >> > because
> >> >> > it'll be easier for developers to store large objects in
> >> >> > LoaclStorage.
> >> >> > 2)  As far as I can tell, there's no where else in the spec where
> you
> >> >> > have
> >> >> > to serialize structured clone(able) data to disk.  Given that
> >> >> > LocalStorage
> >> >> > is supposed to throw an exception if any ImageData is contained and
> >> >> > since
> >> >> > File and FileData objects are legal, it seems as though making
> >> >> > LocalStorage
> >> >> > handle structured clone data has a fairly high cost to
> implementors.
> >> >> >  Not to
> >> >> > mention that disallowing ImageData in only this one case is not
> >> >> > intuitive.
> >> >> > I think allowing structured clone(able) data in LocalStorage is a
> big
> >> >> > mistake.  Enough so that, if SessionStorage and LocalStorage can't
> >> >> > diverge
> >> >> > on this issue, it'd be worth taking the power away from
> >> >> > SessionStorage.
> >> >>
> >> >> Despite localStorage unfortunate locking contention problem, it's
> >> >> become quite a popular API. It's also very successful in terms of
> >> >> browser deployment since it's available in at least latest versions
> of
> >> >> IE, Safari, Firefox, and Chrome. Don't know about support in Opera?
> >> >
> >> > The more popular it becomes, the more it's going to hurt UA
> developers,
> >> > web
> >> > developers, and users.  I don't see why this is an argument for making
> >> > it
> >> > more powerful.
> >>
> >> How will it hurt UA developers? I think we're stuck forever to
> >> implement the locking mechanism. Adding more datatypes to the API
> >> doesn't mean that we'll have to implement it more.
> >
> >
> > multi-core is the future.  what's the opposite of fine-grained locking?
> >  it's not good ;-)
> > the implicit locking mechanism as spec'd is super lame.  implicitly
> > unlocking under
> > mysterious-to-the-developer circumstances!  how can that be a good thing?
> > storage.setItem("y",
> > function_involving_implicit_unlocking(storage.getItem("x")));
>
> I totally agree on all points. The current API has big imperfections.
> However I haven't seen any workable counter proposals so far, and I
> honestly don't believe there are any as long as our goals are:
>
> * Don't break existing users of the current implementations.
> * Don't expose race conditions to the web.
> * Don't rely on authors getting explicit locking mechanisms right.
>

I agree that there's no way to "fix" local storage without violating one of
these goals.  That's why I think we should just leave it alone and come up
with a better API.  If we keep improving LocalStorage then of course people
will continue using it!  (Which, as I mentioned, will only make the problems
worse.)  In other words, the more features we add to LocalStorage at this
point, the worse we're making the web platform as a whole.

Just to be super clear: I'm not advocating getting rid of LocalStorage as
is.  (Anymore.  :-)  I'm saying that we should leave it as is and come up
with a better storage API that all browser vendors can get behind.

But, as imperfect as the current API is, I think the following is a
> decent way forward:
>
> * Allow pages that want the convenience of localStorage to use it. For
> multi-process browsers this will mean poor UI *for pages that use
> localStorage*. Especially when said pages hold on to localStorage for
> a long time.
> * Add alternative APIs that don't suffer from the same problems. More
> below.
>
> >> > In addition, this argument assumes that Microsoft (and other UAs) will
> >> > implement the structured clone version of LocalStorage.  Has anyone
> (or
> >> > can
> >> > anyone) from Microsoft comment on this?
> >>
> >> Given that I've never heard microsoft commit to a webstandard, ever, I
> >> doubt that we'll hear anything here. Or that the lack of hearing
> >> anything means we can draw any conclusions.
> >>
> >> > This is not a small feature to add.  Yes, it's smaller than creating a
> >> > new
> >> > storage mechanism (that everyone is willing to adopt), but I still
> think
> >> > that's what we should be looking at.  Rather than polishing a turd.
> >>
> >> I do think that localStorage is a decent API that developers will want
> >> to, and should, use. I think looking into adding a async accessor to
> >> get a storage object so that people can use an localStorage-like API
> >> while avoiding risks of blocking. This would also allow sharing data
> >> between worker threads and the main window.
> >
> > i think the async callback to get a storage object is an improvement, but
> > i'm not sure that it addresses all of the problems.  for example, if a
> > worker
> > wants to read values from storage, compute, and then put a value into
> > storage, it would probably do all of this from the storage callback.
>  that
> > would result in holding the lock for a long time, which would lock out
> any
> > other threads, including non-worker threads.
> > the problem here is that localStorage is a pile of global variables.  we
> are
> > trying to give people global variables without giving them tools to
> > synchronize
> > access to them.  the claim i've heard is that developers are not savy
> enough
> > to use those tools properly.  i agree that developers tend to use tools
> > without
> > fully understanding them.  ok, but then why are we giving them global
> > variables?
> > there has to be a better answer.
>
> I actually described an potential solution in the thread on worker storage.
>
> The problem you describe is a worker holding on the the storage for an
> very long (indefinite) time, thereby locking out other threads/windows
> from accessing the same storage area.This seems inevitable if we want
> to prevent race conditions while at the same time not forcing the
> complexities of locks onto web developers. The WebDatabase API suffers
> from exactly the same problem.
>

Actually, it's not the same problem.  When pages access LocalStorage
synchronously, it means that entire event loops get can be blocked by one
worker.  If instead pages only have asynchronous access (as they do in the
WebDatabase API), then the only problem is that their callback will never
get called.  Such resource starvation is of course sub-optimal, but at least
it won't affect other pages in the same event loop.

However, we can lessen the problem. By adding multiple storage areas,
> we can allow a worker to use one storage area, while allowing other
> parties to simultaneously use other storage areas. This way, if a
> worker and a window aren't sharing data at all, they never get in the
> way of each other.
>
> So a very simplistic design would be something like the following:
>
> getStorageArea(name, callback)
>
> when called will asynchronously call the callback parameter once the
> storage area named by the first parameter becomes available. The
> callback receives the storage area as an argument. We would also have
> the function
>
> getMultipleStorageAreas(names, callback)
>
> Same as above, but names is an array of strings indicating multiple
> storage areas that need to be acquired before the callback is called.
> The callback receives all the areas in an array as an argument. This
> function allows transferring data between multiple storage areas
> without risking racing.
>

These are all good suggestions, but I'm not sure even that API would be
powerful enough for developers.

<thought exercise>
For example, how would you implement an offline webmail app?  Well, my first
thought is to make each email a key/value.  But we can't iterate over ranges
of keys, so that won't work.  Maybe instead we can make each key a folder
for our mail?  We could store all the mails in an array.  But that could be
a huge amount of data in each key--on the order of hundreds of megabytes for
some users.  So, to optimize, I guess we could store the emails in their own
keys and then just store large arrays of mail keys in each folder key.  This
would also solve the problem of gmail-like "labels" (i.e. a many to one
relationship between folders/labels to emails).  Oh yeah, and we'd need to
have one key that's a list of all the folders.

Great.  But now we want to allow offline searching of our emails.  Crap.
 Well we can efficiently search arrays with a binary search.  But that takes
a while to update.  Well, we can implement a balanced tree for each index
and then store it in keys.  And have one key that has a list of all our
index keys.  This seems reasonably efficient--as long as the trees don't get
too big.  Even if the implementation is pretty slick, once they get to a
certain size, just loading one key into memory could take a while.
 Especially since the entire time we're updating/reading/whatever our keys,
we're either starving other tabs from accessing the data or blocking their
event loop.
</thought exercise >

So yes: I do think that you can create at least a simple web mail client.
 And, as long as the JavaScript engine is fast enough to handle large,
complex data structures (without running out of memory!), I suppose you
probably could build just about anything on top of it.

But realistically, it seems that (at a minimum) we need to add a way to
iterate over ranges of keys and something with multiple storage areas (as
you suggested).  Honestly, I can't think of anything else that seems super
important right now.  (Though I know the gmail guys would say they need full
text search.  :-)

If it'd be helpful, I could maybe ask some developers here at Google how
they'd like to use LocalStorage to get some more concrete use cases.


> There's several problems with this, such as the names are sort of
> crappy, and that getting storage areas an array isn't very friendly.
> However you get the basic idea.



We don't even need to use Storage objects for this. In fact, I hope
> mozilla will in a not too distant future come up with an alternative
> proposal to the WebDatabase SQL API. Something like this might fit
> into such a proposal as I think that'll have multiple separate storage
> areas anyway.
>

Any chance you could give us a preview of how you envision this API looking?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.whatwg.org/pipermail/whatwg-whatwg.org/attachments/20090924/03838056/attachment-0001.htm>
Received on Thursday, 24 September 2009 01:45:03 UTC

This archive was generated by hypermail 2.3.1 : Monday, 13 April 2015 23:08:52 UTC