- From: Maciej Stachowiak <mjs@apple.com>
- Date: Thu, 20 Sep 2007 01:13:20 -0700
My commentary below. Overall, I think the basic model is fairly sound. But I do think some improvements could be made. On Sep 6, 2007, at 5:46 PM, Ian Hickson wrote: > > Ok, new proposal: > > There's a concept of an application cache. An application cache is a > group > of resources, the group being identified by a URI (which typically > happens > to resolve to a manifest). Resources in a cache are either top-level > or > not; top-level resources are those that are HTML or XML and when > parsed > with scripting disabled have <html application="..."> with the value > of > the attribute pointing to the same URI as identifies the cache. > > When you visit a page you first check to see if you have that page > in a > cache as a known top-level page. Is there any need to treat "top-level" resources differently? If the user directly loads a PNG, JPG or for that matter PDF that's part of an offline manifest, I think it makes sense to serve it from the app cache. It seems like it would simplify the model a bit a bit for the offline cache to treat all items in the manifest as part of the application when visited directly. The only problem here is the potential inconsistency if an HTML or XML document doesn't have the <html application="..."> declaration at the top, but is still cited in some other app's manifest. Then it would be treated as part of the application if an app page citing that manifest was visited first, but not if it wasn't. I think this is ok though and may even be a desirable behavior. For instance, you might not want a single flickr photo page to be an app by itself, but you'd still want it to be treated as part of the app domain for someone who had visited the main application page. > > If you do, skip the next two paragraphs; the 'new cache' flag is set > to > false. > > If you don't, you fetch the URL. If it has no application="" > attribute, > then do whatever the normal thing to do is. Ignore the rest of this. > > The presence of the attribute indicates that it's expecting an > application > cache to apply. The presence is detected at parse time, and must be > present on the first <html> start tag before any other start tags. > Check > that the attribute's value is same-origin safe. If it isn't, pretend > the > attribute wasn't there (and ignore the rest of this). Otherwise, > check to > see if you already have a cache for the given URI. If you don't, > create a > new cache identified by the given URI. In any case, save this > resource to > the identified cache, as a known top-level page for that cache. > Then, act > as if you had known about the cache when you started (next step), > except > with the 'new cache' flag set to true. > > Load the page from the cache and display it. I assume any resource that's not found in the cache can be loaded normally (it would have to be if this is a brand new cache). Actually, I'm not sure "from the cache" makes sense here given the next sentence. > Any resources that the page > tries to fetch using GETs that aren't XMLHttpRequest'ed must be > taken from > the cache, if available. Is it really the right thing for XMLHttpRequest to bypass reading from the cache? It makes sense to me that they wouldn't be implicitly stored in the cache, but I don't think the data you get for a URI should depend on whether you used XMLHttpRequest or loaded it in a frame. To be fair, I'm not sure why you'd want to do an XHR for a resource that then gets served from the offline cache. But I'm also not sure why you'd list an item in your manifest that you then wanted to load with XHR. So it seems simpler to omit this slight complication. > When they aren't, the resources must be fetched then stored in the > cache. If there is an explicit manifest, it seems wrong to store things in the cache that aren't in the manifest but are part of this page. That means you get the union of the manifest and things the page loads, which will make offline behavior hard to debug I think. It would be better to fetch the manifest (possibly getting it from the existing application cache, if any) before proceeding. Then you'd know which of the resources loaded as part of this page belong in the cache up front. That would affect the following steps. > Once the UA is ready to do so the UA must go on to the next steps. > UAs may > do this immediately, or may wait for the original page load to > complete, > or may delay it up to a UA-defined minimum delay. > > If 'new cache' is true, and the cache identifier URI is the same as > the > URI that was just downloaded and put in the cache: Do nothing. > > If 'new cache' is true, and the cache identifier URI is different > from the > URI that was just downloaded: Fetch the resource identified by that > URI. > Store it in the cache. If it's a manifest and it parses correctly, > download all the URIs given in that manifest and add them to the > cache. If > any are HTML files which, when parsed with scripting disabled, > trigger the > application="" handling and have a value that points to the same URI > as > the one identifying this application cache, then mark them as known > top-levels for this cache. There would be no need to parse the resources if there were no distinction drawn between top-level and other resources in the cache. > If 'new cache' is false: Create a new cache. Fetch the resource with > the > URI of the cache identifier. If it's a manifest, and it has changed > from > what's in the last cache, and it parses correctly, download all the > URIs > in that manifest and add them to the new cache. I would suggest going a little beyond the http caching rules. I propose that if the manifest is unchanged (as defined below), the UA doesn't need to download anything. This makes it possible to give the manifest a fairly short http expiration, so that checks for updates are relatively frequent, but make the checks themselves extremely cheap. This would require some modifiable version field in the manifest to let it change when the contents of a referenced resource have changed, but the set of resources hasn't. A UA may consider the manifest "unchanged" if any of the following conditions applies: - If the http freshness lifetime of either the copy in the offline cache or the copy in the normal browser cache has not expired - If a conditional request relative to a copy in either the offline cache or the browser cache (via If-Modified-Since or If-Match) gives a 304 Not Modified response - For non-http protocols, if it appears unmodified using whatever caching scheme is appropriate to the protocol But if none of these applies, the UA should not compare the actual manifest data and should assume the manifest has changed and refetch the resources (possibly using a cache). Note that if the manifest is generated dynamically server-side, then it can always appear new when any resource it points to has changed but still easily save a lot of needless http traffic using ETags. Also, another resource to check manifest freshness before proceeding with a page load is to be able to provide the app with some way of knowing that it is going to upgrade. Then it could choose to display custom upgrading UI instead of proceeding with a normal load of all its resources. In this case though, it would need an event when the upgrade finishes successfully but also one when it fails. > If the manifest has an upgrader entry, use that as the upgrader as > described below. Otherwise, if > it's not a manifest but an HTML/XML file, and it has changed from > what's > in the last cache, use that as the upgrader as described below. If > it's a > manifest that misparsed, or if it's another kind of file, then act > as if > it the URI just pointed to the top level page being loaded (and use > that > as the upgrader as described below). If the newly updated cache > doesn't > contain the current top-level page, then fetch that too. I think it would be preferable if a value that isn't either the empty string or a reference to a valid manifest were treated as if the application attribute was unset. The rules above make it too easy to mistakenly think you are using a manifest when actually you are using implicit application mode, in a way that may not readily show up in offline testing. Plus, getting rid of the ability to define an application via an HTML file other than the current one removes the need for the hidden background browsing context thing, which seems like a whole mess of needless implementation complexity. > When a file is fetched by the main page loading in a background > browsing > context, the loads are conditional loads, so that files that haven't > changed since the previous update are directly copied from the old > cache. This should (of course) apply to loads of resources cited in a manifest and the manifest itself, as suggested above. > If the newly update cache's copy of the top level page being shown > is no > longer categorised as a "known top-level" for this cache (e.g. > because it > doesn't have an <html application> attribute any more) then inform the > user, e.g. an infobar saying something like "This application may no > longer be available. (( View new page in a new window )) (( Delete > application from cache )) (( Keep application in cache and check for > updates later )) [x]". The first of these buttons would just show the > background browsing context in the foreground. The second would > delete the > webapp cache and reload the page from the normal cache, and the third > would just not do anything special. Don't run the upgrader in this > case. Not distinguishing "top-level" resources would remove the need to present such potentially confusing UI to the user. (A page with implicit manifest, i.e. pointing to itself as the cache, could simply cease to get special caching if a version is loaded that doesn't have <html application=""> set). > If any of the files being updated in the new cache are 4xx or 5xx, > or fail > for some other reason (e.g. DNS errors, user went offline), then the > UA > should alert the user to this fact somehow (infobar maybe) -- "An > error > occurred while updating the application. (( View details )) [x]" -- > and > then wait a few minutes (or longer if it can tell it'll fail again) > before > trying again. I think this is inappropriate. The offline model should work with intermittent connections or in captive wifi networks, and showing this kind of error to the user seems unhelpful. What's wrong with just using the complete old version and trying the update again later? > Upgrader: > Create a hidden browsing context. > Load the upgrader in it. I don't like this whole upgrader idea. Parsing HTML and CSS and executing JavaScript seems like an inefficient way to do an app update. I think it is reasonable to require a manifest file for multipage apps, and writing an HTML/CSS/JS upgrader that can cover all pages of a multipage app does not seem significantly easier than creating a manifest file. The implicit manifest idea seems handy as a quick way to handle one-page apps but it does not seem reasonable for the multipage case, and this would obviate the need for an upgrader. > Just before onload, fire an 'upgrading' event to every instance of a > top-level page using a cache with the same identifier. Whether or not there are upgraders though, I think events should dispatch when a manifest-based upgrade either completes or fails (and perhaps also when the upgrade starts). > The event has a handle to the Window object of the hidden browsing > context. > After every 'upgrading' event has been fired, the 'load' event must be > fired on the upgrader. > After that happens, if any of the aforementioned instances are still > using old versions of the cache, then the user agent may inform user > they can reload to update. I think it would be preferable to let the apps upgrade themselves instead. They could choose to use location.reload() if they are not holding any interesting state, or they could offer to save the user's state before doing this, or they could make some alternate call that requests all new resource loads for this instance should come from the freshly upgraded cache, which would let it perform an upgrade manually preserving current state if feasible. > The Upgrader can do such things as updating the database schema > between > versions, and when there are multiple instances running, it allows > them to > negotiate who will do that work instead of it happening several times. I would suggest instead that the instance that triggered the upgrade be given a special event so that it can do this and could optionally present in-page UI while doing so. This seems simpler than adding a hidden browsing context. Changing the schema may require pausing other instances, however, if there is no way to lock the database. > Modal alerts (window.alert, .prompt, etc) in the background page can > either raise an exception, be ignored, drop a message to a console, or > possibly display a message over the top of the foreground app's > browsing > context. To avoid such complexities it would be better to avoid the idea of a hidden upgrader. And in-page UI could be more tasteful than prompts or alerts. > The manifest format has: > a list of URIs. > optionally a place to have an opaque string which can be changed > arbitrarily (this gives authors a way to change the manifest when > they > want things to be refetched). > optionally a URI for an upgrader (HTML file). I'd skip the upgrader part. I would also consider adding optional versions of resources where the UA may assume if the version number is unchanged it doesn't have to fetch that resource (not even conditionally) as part of an upgrade to make the supercaching effect even more super, but perhaps that's overkill. > We provide an API that can add files to the cache, and that can be > queried > to determine if we are in upgrader mode or not, and that can swap in a > new cache without reloading the page, during the 'upgrading' event. Other API I'd suggest: 1) Request an immediate attempt at upgrade, notwithstanding apparent freshness of the manifest. This could be used to force an upgrade in "oops" situations where the manifest has a long expiration but a buggy version of the app is accidentally shipped and the server gives an error to ask the app to update immediately. 2) A way to send messages to other app instances - this way, an instance performing a database scheme update could ask other instances to hold off on database access, or similarly for an instance doing a sync of data from the network to the local database. 3) An API to explicitly remove resources from the cache. I'm not sure if an API to introspect what is currently in the cache is needed. I can't think of a use case off hand. But both the Google Gears LocalServer API and the Mozilla offline API have this. > (If a particular URI is in an application cache as a known top- > level, but > later is fetched and found to be a known top-level for another > application, e.g. because two other pages both fetch that page in > their > manifest and the server returns pages with different application="" > links > for those two apps, then if the page is visited directly, it uses > the app > cache of the last cache to have found it as a top-level. This causes > problems if visiting the page directly would return yet another cache > identifier, as then you could only see that page if you'd never seen > the > others. I'm not clear about what to do about that.) > > Maybe we should check for updates more often than just when the top- > level > page is loaded. e.g. we could do it on a timer, or on every cache > hit when > online. I don't think an already-loaded running instance should trigger a cache update implicitly, only if it explicitly asks. So I'd advise against these. See also my other email about offline fallback pages. These should be specified in the manifest. A la the Google Gears API, I also think a feature is needed to do something useful with <input type="file"> when offline, to save a resource for later upload to the server. Preferably this should not require round-tripping the data through an ECMAScript string or number array, or it will be too inefficient to work for large files. I also don't see how apps that require login will be able to work offline. Do you need to make sure to check the appropriate "remember me on this computer" checkbox (perhaps not desirable for the security- conscious, and not available on all apps in any case)? Do you get to access the app when offline without having to go through login at all (which seems like a security issue)? Regards, Maciej
Received on Thursday, 20 September 2007 01:13:20 UTC