Re: Fixing appcache: a proposal to get us started

On Tue, Mar 26, 2013 at 3:21 AM, Jake Archibald <jaffathecake@gmail.com> wrote:
> On 26 March 2013 07:02, Jonas Sicking <jonas@sicking.cc> wrote:
>>
>> {
>>   "expiration": 300,
>>   "cache": ["index.html", "index.js", "index.css"]
>> }
>>
>> If the user navigates to index.html The following happens:
>>
>> If the user is online and we haven't checked for update for the
>> appcache in the last 5 minutes (300 seconds) we simply ignore the
>> cache and load index.html and any resources it links to from the
>> network.
>
>
> How is "online" defined here? What if we're online but the page 404s, what
> if we're online but DNS to that domain fails etc etc.
>
> Offline-first isn't intuitive, I agree. I did some hacking with FALLBACK to
> get online-first behaviour in Lanyrd and quickly learned why it's a bad
> idea, offline performance was dire as the device had to fail to connect
> before it'd show me cached data. I'd rather stick with offline-first but
> present it in a way that's expected.

This is a tricky problem indeed.

The current appcache actually has the behavior that you're advocating,
but that's something that a lot of developers has complained about. In
fact, that's the second biggest complaint that I've heard only
trailing the confusing "master entries" behavior.

On one hand I totally agree that we hitting the network means
potentially doing a terribly slow load over an 2g network connection
which ends up failing half-way through. There is simply no way to
ensure a good user experience when you decide to hit the network
first.

On the other hand, it creates the "load twice to get latest version"
behavior that a lot of developers dislike. I.e. when a user opens a
website they end up getting the previous version of the website and
have to reload to get the new version.

Two strong concerns that developers have expressed with this is that
it means that when website X announces that they've launched a new
version of the website, a user which goes to X will still see the old
version. Another issue is that if a website rolls out a security fix,
they don't want people running a cached copy of the exploitable
version.

I don't know how to fully reconcile these two conflicting
requirements. I.e. "don't hit a potentially slow network if there's a
cached copy" and "if the user is online, don't run an old version".

The "expiration" property is the best solution I've been able to come
up with so far. A website that considers it important to have users
always run the latest version can set the number low which prevents
running an old version if possible. A website that prefers to optimize
for performance can set a high number.

This doesn't seem like a problem that's specific to this appcache
proposal. A website using the NavigationController API will have to
make exactly the same types of decisions as far as I can tell. It's
just that it'll get to make them using explicit JS code rather than by
adjusting properties in a manifest.

If you have other ideas for how we can solve this then I'd love to
hear it. If we need to add more knobs to allow authors to choose which
policies they want to use then I have no problem with that. It would
be particularly interesting to hear what policies people are planning
on implementing using NavigationController to see if we can enable
those.

>> If the user is offline, or if we checked for update for the appcache
>> within the last 5 minutes, we use the index.html from the appcache
>> without hitting the network first. If index.html uses index.js or
>> index.css, those will be immediately loaded from the cache.
>
> Is the opposite true? If index.html is loaded from the network will index.js
> ever come from the cache? (current appcache says no)

To start simple I'm so far proposing that the answer is "no". What's
the plan on the NavigationController side?

>> Whenever we check for updates for an appcache with the above manifest
>> we do an if-modified-since/if-none-match for the manifest. We then do
>> an update check for any resource requested by the manifest. I.e. even
>> if the manifest hasn't changed we still do an update check for each
>> resource linked to by the manifest. If any resources were added since
>> the previous manifest those are obviously simply downloaded. If any
>> resources were removed from the manifest those are discarded. As an
>> optimization the UA can start doing update checks on the same set of
>> URLs that the previous version of the manifest contained.
>
> Don't get the last line, does "the same set of URLs that the previous
> version of the manifest contained" mean "URLs that are in the current
> version, and also the previous version"? Is this suggesting that obeying
> HTTP cache headers is an optional optimisation?

Sorry, I might have derailed this by mentioning the performance optimization.

The basic idea is that even if the manifest hasn't changed, we still
do an update check on all resources. I suspect that both for the
update check of the manifest itself as well as the resources we still
honor the normal http caching semantics. Except that caching
heuristics is disabled. (I think that means that the UA should not
estimate expiration times based on last-modification headers. Are
there other heuristics commonly done?)

As an optimization which may or may not be a good idea the UA could do
the following:

To avoid latency, when the UA wants to check an appcache for updates,
it can do if-modified-since requests for the manifest *and* the
resources at the same time. Since the manifest hasn't yet been
downloaded, this would mean that we only know the contents of the
"old" manifest.

This might mean that the UA does an if-modified-since request for a
resource which doesn't appear in the new manifest. This shouldn't
break anything, it'll just mean that we do an unnecessary download of
a resource.

I don't think this is something we should rathole too much on. It's
not something that would be a normative requirement in the spec. At
the most an informative note about a possible optimization. It might
not even be a good idea if bandwidth is a bigger concern than latency.
And if the server supports SPDY it's probably not needed at all.

>> In order to further cut down on the number of network requests, we'd
>> also enable providing last-modified dates or etags directly in the
>> manifest:
>>
>> {
>>   "expiration": 300,
>>   "cache": [{ url: "index.html", "etag": "84ba9f"},
>>             { url: "index.js", "last-modified": "Wed, 1 May 2013
>> 04:58:08 GMT" },
>>             "index.css"]
>> }
>
>
> Adding these into the manifest would require some kind of automation, if
> there's a level of automation couldn't it just change the url to
> index.84ba9f.js and have rewriting or static generation take care of the
> rest?

That's probably true, but you'd still need to include a last-modified
date or an etag in the manifest in order to avoid the UA checking if
the existing url has been updated, no?

I.e. simply doing

{
  "version": "15.3",
  "expiration": 300,
  "cache": ["index.html",
            "index.js",
            "index.css"]
}

would result in if-modified-since requests to index.html/js/css
whenever the version is updated.


>> {
>>   "expiration": 300,
>>   "cache": ["index.html", "index.js", "index.css"],
>>   "cookie-vary": "uid"
>> }
>>
>> This would mean that even if the user is offline and navigates to
>> index.html, if the value of the "uid" cookie is different from when
>> the appcache was last updated, the appcache would not be returned. A
>> UA could even use the value of the "uid" cookie as an additional key
>> in its appcache registry and thus support keeping appcaches for
>> different users on the same device.
>
>
> If I have an expiration of a day, and log out as one user log in with
> another, then get on a plane, none of my content would be available to me?

Yes. That's the purpose of the cookie-vary feature. Feedback from
developers has been that often user-specific data ends up in the
appcache and showing that information to another user could be both a
privacy as well as a user-confusion issue.

>> The actual AppCache object has the following API:
>
> I'm worried the API here loses the simplicity that this proposal is supposed
> to have over the scripted solution, where you have to learn a manifest
> format and how it interacts with items added via its scripting API, whereas
> the navigation controller only has a scripting API.

This is a good point. What features do you suggest we remove?

However note that removing scripting features here and having people
use NavigationController doesn't mean that people end up doing less
scripting. It just means that they have to do it a different way.

Note that a goal here is that the NavigationController API and the
AppCache API is as aligned as possible. Ideally even using the same
API. This is not something that I've started looking at yet due to
wanting to get an initial draft out.

But I'm hoping that we can start that work very soon.

> Is there anything here that couldn't be done with the NavigationController &
> a library? I'm not suggesting that's reason for it not to exist, just
> wondering if it's offering anything unique.

It's a goal that all of the features in this proposal can be
implemented using NavigationController. That's the "layering" that
Yehuda and Alex Russell like to talk about. So I hope the answer is
"no" :)

If there are, and those features are good, we should probably look at
adding them to the NavigationController.

/ Jonas

Received on Wednesday, 27 March 2013 00:15:45 UTC