Re: Fixing appcache: a proposal to get us started from Alec Flett on 2013-03-27 (public-webapps@w3.org from January to March 2013)

From: Alec Flett <alecflett@chromium.org>
Date: Tue, 26 Mar 2013 19:40:18 -0700
To: Jonas Sicking <jonas@sicking.cc>
Cc: Jake Archibald <jaffathecake@gmail.com>, Webapps WG <public-webapps@w3.org>
Message-ID: <CAHWpXeZv0K3u7n6Djz7da+4XOfU5AMAFTJadVcGBVs8x9sFECA@mail.gmail.com>
> This is a tricky problem indeed.
>
> The current appcache actually has the behavior that you're advocating,
> but that's something that a lot of developers has complained about. In
> fact, that's the second biggest complaint that I've heard only
> trailing the confusing "master entries" behavior.
>
>
I personally think the problem with this particular aspect of the existing
appcache is that its so incredibly hard to clear the cache and go online
during development - i.e. once you're offline you have to jump through
hoops to get back online. A secondary issue was that once you as a
developer got used to dealing with that, If your users somehow get stuck in
an offline state because of a bug, there is/was no real way to "repair"
them other than telling them to clear their cache.


>
> On the other hand, it creates the "load twice to get latest version"
> behavior that a lot of developers dislike. I.e. when a user opens a
> website they end up getting the previous version of the website and
> have to reload to get the new version.
>
> I think that if there is a programmatic API that is available early-on,
then at least starting in offline gives the developer the option of going
online if they so choose - and it could be done even before the onload
handler if they want to avoid flashing the old/deprecated page in the
browser. If you require hitting the network first, then I can't think of
how you'd write programmatic hooks to bypass that.

I personally think that no matter how expressive the declarative syntax is,
developers are always going to need to work around it - "expiration" or
"staleness" is simply too complex to just give an absolute or relative date
or time  - caching policy in apps can simply depend on things that extend
beyond your caching syntax - I mean imagine a caching policy that depends
on it being before or after sunset in your locale.


>
> If you have other ideas for how we can solve this then I'd love to
> hear it. If we need to add more knobs to allow authors to choose which
> policies they want to use then I have no problem with that. It would
> be particularly interesting to hear what policies people are planning
> on implementing using NavigationController to see if we can enable
> those.
>
>
A more complex, real example: at my last company had a "systemwide horizon"
expiration policy that we implemented with a caching proxy. Imagine this: a
very large interconnected data set where individuals spent their time
editing data in small regions of the corpus. The goal was if you made an
edit, then everything YOU saw would be consistent with that edit. It was
perfectly reasonable to have other users see "stale" versions of any page -
a poor man's (i.e. startup with only a few application server's)
eventually-consistent solution.

The way this worked, if any individual user made changes to a particular
dataset that affected a page, they would get a cookie set on their client
saying "you have made changes through time T" and all future pages that
they visited had to be newer than time T. When the browser would hit the
proxy with an If-Modified-Since, the proxy would look at the cookie and say
"Hmm.. I have a stale version of this page at time T-6, I'd better
regenerate it" or "I have a version of the page at time T+2, so I can give
this to you" -

To make this work we had to set max-age=0, essentially bypassing the entire
user's browser cache for every page, even if the server mostly responded
with a 304. (so the proxy server sitting in our colo in Santa Clara
functioned as your browser's cache because that was the place we could
programatically write a policy)

That really sucked for performance though, so we increased max-age to maybe
30 seconds, and put a generated <script> in the head that included the time
the page was generated, and then compared the cookie to the embedded time.
If the cookie was higher, then we know the page was served stale (by our
definition of stale) from the browser cache so we forced a refresh. Since
this was all in the <head>, the page didn't even flicker.

Something like
<head>
  <script>var lastWriteTime=1292830; // generated in-page by the template
engine
     if (lastWriteTime < extractLWT(document.cookie)) reload(); //
boilerplate cache policy
</script>

But of course the problem there is that ONLY works on HTML - all other
resources had to have a different policy.


With a NavigationController model (or some other programmatic model) you
can write arbitrary logic. to deal with these kinds of cases. I'm not
saying the declarative model doesn't fix 80% of the issues, but more that
you still need a programmatic model in addition.



> >> If the user is offline, or if we checked for update for the appcache
> >> within the last 5 minutes, we use the index.html from the appcache
> >> without hitting the network first. If index.html uses index.js or
> >> index.css, those will be immediately loaded from the cache.
> >
> > Is the opposite true? If index.html is loaded from the network will
> index.js
> > ever come from the cache? (current appcache says no)
>
> To start simple I'm so far proposing that the answer is "no". What's
> the plan on the NavigationController side?
>
> >> Whenever we check for updates for an appcache with the above manifest
> >> we do an if-modified-since/if-none-match for the manifest. We then do
> >> an update check for any resource requested by the manifest. I.e. even
> >> if the manifest hasn't changed we still do an update check for each
> >> resource linked to by the manifest. If any resources were added since
> >> the previous manifest those are obviously simply downloaded. If any
> >> resources were removed from the manifest those are discarded. As an
> >> optimization the UA can start doing update checks on the same set of
> >> URLs that the previous version of the manifest contained.
> >
> > Don't get the last line, does "the same set of URLs that the previous
> > version of the manifest contained" mean "URLs that are in the current
> > version, and also the previous version"? Is this suggesting that obeying
> > HTTP cache headers is an optional optimisation?
>
> Sorry, I might have derailed this by mentioning the performance
> optimization.
>
> The basic idea is that even if the manifest hasn't changed, we still
> do an update check on all resources. I suspect that both for the
> update check of the manifest itself as well as the resources we still
> honor the normal http caching semantics. Except that caching
> heuristics is disabled. (I think that means that the UA should not
> estimate expiration times based on last-modification headers. Are
> there other heuristics commonly done?)
>
> As an optimization which may or may not be a good idea the UA could do
> the following:
>
> To avoid latency, when the UA wants to check an appcache for updates,
> it can do if-modified-since requests for the manifest *and* the
> resources at the same time. Since the manifest hasn't yet been
> downloaded, this would mean that we only know the contents of the
> "old" manifest.
>
> This might mean that the UA does an if-modified-since request for a
> resource which doesn't appear in the new manifest. This shouldn't
> break anything, it'll just mean that we do an unnecessary download of
> a resource.
>
> I don't think this is something we should rathole too much on. It's
> not something that would be a normative requirement in the spec. At
> the most an informative note about a possible optimization. It might
> not even be a good idea if bandwidth is a bigger concern than latency.
> And if the server supports SPDY it's probably not needed at all.
>
> >> In order to further cut down on the number of network requests, we'd
> >> also enable providing last-modified dates or etags directly in the
> >> manifest:
> >>
> >> {
> >>   "expiration": 300,
> >>   "cache": [{ url: "index.html", "etag": "84ba9f"},
> >>             { url: "index.js", "last-modified": "Wed, 1 May 2013
> >> 04:58:08 GMT" },
> >>             "index.css"]
> >> }
> >
> >
> > Adding these into the manifest would require some kind of automation, if
> > there's a level of automation couldn't it just change the url to
> > index.84ba9f.js and have rewriting or static generation take care of the
> > rest?
>
> That's probably true, but you'd still need to include a last-modified
> date or an etag in the manifest in order to avoid the UA checking if
> the existing url has been updated, no?
>
> I.e. simply doing
>
> {
>   "version": "15.3",
>   "expiration": 300,
>   "cache": ["index.html",
>             "index.js",
>             "index.css"]
> }
>
> would result in if-modified-since requests to index.html/js/css
> whenever the version is updated.
>
>
> >> {
> >>   "expiration": 300,
> >>   "cache": ["index.html", "index.js", "index.css"],
> >>   "cookie-vary": "uid"
> >> }
> >>
> >> This would mean that even if the user is offline and navigates to
> >> index.html, if the value of the "uid" cookie is different from when
> >> the appcache was last updated, the appcache would not be returned. A
> >> UA could even use the value of the "uid" cookie as an additional key
> >> in its appcache registry and thus support keeping appcaches for
> >> different users on the same device.
> >
> >
> > If I have an expiration of a day, and log out as one user log in with
> > another, then get on a plane, none of my content would be available to
> me?
>
> Yes. That's the purpose of the cookie-vary feature. Feedback from
> developers has been that often user-specific data ends up in the
> appcache and showing that information to another user could be both a
> privacy as well as a user-confusion issue.
>
> >> The actual AppCache object has the following API:
> >
> > I'm worried the API here loses the simplicity that this proposal is
> supposed
> > to have over the scripted solution, where you have to learn a manifest
> > format and how it interacts with items added via its scripting API,
> whereas
> > the navigation controller only has a scripting API.
>
> This is a good point. What features do you suggest we remove?
>
> However note that removing scripting features here and having people
> use NavigationController doesn't mean that people end up doing less
> scripting. It just means that they have to do it a different way.
>
> Note that a goal here is that the NavigationController API and the
> AppCache API is as aligned as possible. Ideally even using the same
> API. This is not something that I've started looking at yet due to
> wanting to get an initial draft out.
>
> But I'm hoping that we can start that work very soon.
>
> > Is there anything here that couldn't be done with the
> NavigationController &
> > a library? I'm not suggesting that's reason for it not to exist, just
> > wondering if it's offering anything unique.
>
> It's a goal that all of the features in this proposal can be
> implemented using NavigationController. That's the "layering" that
> Yehuda and Alex Russell like to talk about. So I hope the answer is
> "no" :)
>
> If there are, and those features are good, we should probably look at
> adding them to the NavigationController.
>
> / Jonas
>
>
Received on Wednesday, 27 March 2013 02:41:07 UTC