Re: Fixing appcache: a proposal to get us started from Jonas Sicking on 2013-03-29 (public-webapps@w3.org from January to March 2013)

From: Jonas Sicking <jonas@sicking.cc>
Date: Fri, 29 Mar 2013 12:53:16 -0700
To: Alec Flett <alecflett@chromium.org>
Cc: Jake Archibald <jaffathecake@gmail.com>, Webapps WG <public-webapps@w3.org>
Message-ID: <CA+c2ei96+PVAtosDgOo+3sbx40Q2ro6U6Fs1Dt0KONaDWGvUCw@mail.gmail.com>
On Tue, Mar 26, 2013 at 7:40 PM, Alec Flett <alecflett@chromium.org> wrote:
>
>> This is a tricky problem indeed.
>>
>> The current appcache actually has the behavior that you're advocating,
>> but that's something that a lot of developers has complained about. In
>> fact, that's the second biggest complaint that I've heard only
>> trailing the confusing "master entries" behavior.
>>
>
> I personally think the problem with this particular aspect of the existing
> appcache is that its so incredibly hard to clear the cache and go online
> during development - i.e. once you're offline you have to jump through hoops
> to get back online. A secondary issue was that once you as a developer got
> used to dealing with that, If your users somehow get stuck in an offline
> state because of a bug, there is/was no real way to "repair" them other than
> telling them to clear their cache.
>
>> On the other hand, it creates the "load twice to get latest version"
>> behavior that a lot of developers dislike. I.e. when a user opens a
>> website they end up getting the previous version of the website and
>> have to reload to get the new version.
>>
> I think that if there is a programmatic API that is available early-on, then
> at least starting in offline gives the developer the option of going online
> if they so choose - and it could be done even before the onload handler if
> they want to avoid flashing the old/deprecated page in the browser. If you
> require hitting the network first, then I can't think of how you'd write
> programmatic hooks to bypass that.

You might be right and that off-line first behavior is the right
behavior in the majority of cases.

Though based on feedback from talking to people I'm fairly sure that
the on-line first behavior is something that at least *some*
developers want.

But maybe making offline-first be the default and allow people to opt
in to online-first?

Do note though that with the proposal as it stands today it is fairly
easy to get the offline first behavior. Simply set the expiration time
to a "really long time" and then use the explicit API for doing update
checking against the network as needed.

However I'm definitely willing to try changing things around such that
offline-first is more of a default.

How about this new proposal:

{
  "cache": [...],
}

This would use a "default" expiration time of one day. Whether the
cache has expired or not we here always use the offline resources
first. However if the cache has expired we also do an update check in
the background after the page has loaded.

{
  "expiration": 300,
  "cache": [...],
}

This still has the semantics of always using offline-first. The only
difference is that already after 5 minutes would we do a background
update check once the page has loaded.

{
  "dont-use-expired": true,
  "expiration": 3600,
  "cache": [...],
}

This manifest would hit the network if the user is online and if the
cache has expired, i.e. if it's been more than an hour since we last
checked for update.

> I personally think that no matter how expressive the declarative syntax is,
> developers are always going to need to work around it - "expiration" or
> "staleness" is simply too complex to just give an absolute or relative date
> or time  - caching policy in apps can simply depend on things that extend
> beyond your caching syntax - I mean imagine a caching policy that depends on
> it being before or after sunset in your locale.

I definitely agree that having a programmatic API is needed! That's
the intent of the checkForUpdate() and download() functions.

If we don't have those functions I think very few websites would be
able to use the manifest at all and would be forced to use the
NavigationController instead.

>> If you have other ideas for how we can solve this then I'd love to
>> hear it. If we need to add more knobs to allow authors to choose which
>> policies they want to use then I have no problem with that. It would
>> be particularly interesting to hear what policies people are planning
>> on implementing using NavigationController to see if we can enable
>> those.
>>
>
> A more complex, real example: at my last company had a "systemwide horizon"
> expiration policy that we implemented with a caching proxy. Imagine this: a
> very large interconnected data set where individuals spent their time
> editing data in small regions of the corpus. The goal was if you made an
> edit, then everything YOU saw would be consistent with that edit. It was
> perfectly reasonable to have other users see "stale" versions of any page -
> a poor man's (i.e. startup with only a few application server's)
> eventually-consistent solution.
>
> The way this worked, if any individual user made changes to a particular
> dataset that affected a page, they would get a cookie set on their client
> saying "you have made changes through time T" and all future pages that they
> visited had to be newer than time T. When the browser would hit the proxy
> with an If-Modified-Since, the proxy would look at the cookie and say "Hmm..
> I have a stale version of this page at time T-6, I'd better regenerate it"
> or "I have a version of the page at time T+2, so I can give this to you" -
>
> To make this work we had to set max-age=0, essentially bypassing the entire
> user's browser cache for every page, even if the server mostly responded
> with a 304. (so the proxy server sitting in our colo in Santa Clara
> functioned as your browser's cache because that was the place we could
> programatically write a policy)
>
> That really sucked for performance though, so we increased max-age to maybe
> 30 seconds, and put a generated <script> in the head that included the time
> the page was generated, and then compared the cookie to the embedded time.
> If the cookie was higher, then we know the page was served stale (by our
> definition of stale) from the browser cache so we forced a refresh. Since
> this was all in the <head>, the page didn't even flicker.
>
> Something like
> <head>
>   <script>var lastWriteTime=1292830; // generated in-page by the template
> engine
>      if (lastWriteTime < extractLWT(document.cookie)) reload(); //
> boilerplate cache policy
> </script>
>
> But of course the problem there is that ONLY works on HTML - all other
> resources had to have a different policy.
>
> With a NavigationController model (or some other programmatic model) you can
> write arbitrary logic. to deal with these kinds of cases. I'm not saying the
> declarative model doesn't fix 80% of the issues, but more that you still
> need a programmatic model in addition.

I totally agree. We always planned for having a worker fallback to
handle the more complex policies. So it was really awesome to see that
Google was already working on such a worker-based API.

I wonder if you can get close to the behavior that you are describing
above by calling checkForUpdate() and download() after any user edit?
That way the user immediately gets a new version with the edits he/she
created.

/ Jonas
Received on Friday, 29 March 2013 19:54:14 UTC