[Bug 17974] New: appcache: Add an API to make appcache support caching specific URLs dynamically

https://www.w3.org/Bugs/Public/show_bug.cgi?id=17974

           Summary: appcache: Add an API to make appcache support caching
                    specific URLs dynamically
           Product: HTML WG
           Version: unspecified
          Platform: Other
        OS/Version: other
            Status: NEW
          Severity: normal
          Priority: P3
         Component: other Hixie drafts (editor: Ian Hickson)
        AssignedTo: ian@hixie.ch
        ReportedBy: contributor@whatwg.org
         QAContact: contributor@whatwg.org
                CC: ian@hixie.ch, mike@w3.org, annevk@annevk.nl,
                    public-webapps@w3.org, adrianba@microsoft.com,
                    michaeln@google.com, kennyluck@w3.org,
                    phihag@phihag.de, hbambas@mozilla.com,
                    thesis@pigsel.net


This was was cloned from bug 14364 as part of operation convergence.
Originally filed: 2011-10-03 14:00:00 +0000
Original reporter: Louis-R <louisremi@mozilla.com>

================================================================================
 #0   Louis-R                                         2011-10-03 14:00:00 +0000 
--------------------------------------------------------------------------------
A simple way to make the appcache dynamic would be to allow data-uris as
manifests, to allow scripts to require new ressources to be cached, without
server round-trips.

This is of course not an ideal solution to make the appcache dynamic, but it is
one easy to implement and to get out of the door quickly.
================================================================================
 #1   Ian 'Hixie' Hickson                             2011-10-03 22:35:53 +0000 
--------------------------------------------------------------------------------
We're not going to add sub-optimal solutions just so we can get something out
one year earlier, when the Web is going to last decades. :-)

What we need here is a clear understanding of the use cases and requirements.
What are the cases where you're wishing you could add URLs to the appcache
dynamically?
================================================================================
 #2   Philipp Hagemeister                             2011-10-07 12:35:38 +0000 
--------------------------------------------------------------------------------
Wouldn't that allow anyone to hijack a website forever?

1. Attacker temporarily gains control over the content of http://example.com/ ,
and writes

<html manifest="data:text/cache-manifest;base64,Q0FDSEUgTUFOSUZFU1QK">
example.com defaced!
</html>

2. User visits http://example.com/, puts the page in appcache.

3. Rightful owner of example.com regains control (or domain ownership changes
if the domain was hijacked, ...).

4. User visits http://example.com/, still sees defacement.

How can the rightful owner of example.com ever serve the user anything?


On the other hand, locking the content (and scripts) of a website forever could
also provide benefits to a carefully-engineered project. JavaScript on the page
could somehow download the new version, cryptographically verify it (beyond
SSL, which may be compromised by .gov actors, like google.com in Iran
recently), and only then update to the new version.
================================================================================
 #3   Ian 'Hixie' Hickson                             2011-10-21 22:43:39 +0000 
--------------------------------------------------------------------------------
Yeah we're definitely not using data: for this.


Status: Did Not Understand Request
Change Description: no spec change
Rationale: What are the use cases for making appcache dynamic? (I'm not saying
there aren't any, I just need to know what they are to design the solution for
them.)
================================================================================
 #4   Louis-R                                         2011-10-24 19:52:37 +0000 
--------------------------------------------------------------------------------
Granted, using data isn't the best option.

I've written an extensive blog post about the use cases for a dynamic appcache:
http://www.louisremi.com/2011/10/07/offline-web-applications-were-not-there-yet/

tl;dr: if you build an rss reader with checkbox to make articles available
offline, it's easy to store/delete the text content of the article at will
using localStorage or indexedDb, but it's impossible to store/delete associated
images (and sounds/videos). You could dynamically generate a cache manifest for
all "offline enabled" articles, but the client would have to re-download all
resources every-time the manifest is updated, as you know. (and you can't store
images as data-uris, since they come from different origins)

Mozilla implemented a simple "OfflineResourceList" API which solves that
problem by enhancing applicationCache with "add()" and "remove()" methods.
This is the kind of solution I am looking for, although "add" is a confusing
name, since it should be able to update a particular resource too.

There is a risk that this API could cause confusion amongst web developers.
Should they use a cache manifest or abandon it completely in favor of the JS
API? I believe the cache manifest should be advocated to be used for the
application structure+presentation+logic (HTML, CSS, JS), while the dynamic API
should be used for the application *content* (medias, xml, json).
================================================================================
 #5   Ian 'Hixie' Hickson                             2011-10-25 02:26:46 +0000 
--------------------------------------------------------------------------------
Thanks, will investigate.
================================================================================
 #6   Ian 'Hixie' Hickson                             2011-10-27 00:15:14 +0000 
--------------------------------------------------------------------------------
So the problem is that you write an application that, while online, downloads a
bunch of data from the server, and this data includes references to
cross-origin images, and you want to make sure that those immediately get
cached too, so that when the user later goes offline and tries to use that
data, the browser won't otherwise be able to show the images?

You can work around that today using the FALLBACK section, no? (List the
foreign image sites as fallback namespaces that fall back to a "broken image"
icon, say, and then when you fetch all the data from your server, quickly also
create <img> elements for all those foreign images. They'll then be cached.)

Still, I could see how that wouldn't be satisfactory. So for this use case,
we'd need an API to add a URL to the cache manually, an API to remove a URL
from the cache manually, and an API to list all the files that have been added
manually? That seems easy enough to support.
================================================================================
 #7   Ian 'Hixie' Hickson                             2011-11-03 16:03:26 +0000 
--------------------------------------------------------------------------------
Status: Partially Accepted
Change Description: none yet
Rationale: The use case described in comment 6 seems reasonable. I have marked
this LATER so that we can look add this once browsers have caught up with what
we've specified so far.
================================================================================
 #8   Simon Pieters                                   2011-11-04 06:16:57 +0000 
--------------------------------------------------------------------------------
I believe this has already happened.
================================================================================
 #9   Ian 'Hixie' Hickson                             2011-11-04 17:08:04 +0000 
--------------------------------------------------------------------------------
I didn't mean just with appcache.

Do I take it from your comment that there is implementation interest in adding
this now?
================================================================================
 #10  Anne                                            2011-11-15 12:18:52 +0000 
--------------------------------------------------------------------------------
It seems both developers and implementors want this, yes.
================================================================================
 #11  michaeln@google.com                             2011-11-15 22:48:10 +0000 
--------------------------------------------------------------------------------
I think this request makes sense but is not the most pressing issue to resolve,
this would be of great convenience. 

But tweeking the model for loading pages from, and associating pages with, and
updating caches such that it works for wider variety of use cases is more of a
priority (imo). I'd like to see that get in better shape prior to mixing in
support for ad-hoc resources.
================================================================================
 #12  Ian 'Hixie' Hickson                             2012-05-03 18:12:24 +0000 
--------------------------------------------------------------------------------
An idea I was kicking around would be to instead have just a way to declare a
JS file as being a local interceptor, and then have that JS file be
automatically launched in a worker thread, and then every network request gets
proxied through that worker in some well-defined manner. The worker could then
either say "do whatever you would normally do for that URL", or "redirect to
this URL and try again", or "here's the data for that URL".

That would allow authors to implement the above add/remove functionality
themselves just by pushing the data into a blob store (FIlesystem API, Index
DB), which would be just a few lines of code, while also allowing much more
flexible approaches.

Any opinions?
================================================================================
 #13  Philipp Hagemeister                             2012-05-03 21:05:53 +0000 
--------------------------------------------------------------------------------
The JavaScript redirector sounds fantastic, but it sounds complicated to
implement in the current state.

Wouldn't it be way simpler to just load a defined fallback HTML document? For
example, given the following appcache:

CACHE MANIFEST
ALIAS:
/x.html /serve-file.html
/files/* /serve-file.html
# serve-file.html is automatically included in the appcache

The request to /files/test.html would just render serve-file.html, but under
the original (window.)location (just like FALLBACK does). In fact, ALIAS would
be exactly like a FALLBACK entry that always fails to load. Additionally, the *
placeholder would allow marking whole multiple URLs as belonging to the
manifest.

On review, this seems very easy to implement, both for user agent and web
application authors.

As a downside, it doesn't allow embedding of non-HTML resources like images. It
does allow downloads via window.location.replace(dataUri). To me, that doesn't
like a big deal since any dynamically generated page should be using data URIs
for dynamically generated images/scripts/styles in the first place.
================================================================================
 #14  Ian 'Hixie' Hickson                             2012-05-04 18:10:01 +0000 
--------------------------------------------------------------------------------
The idea would be to render pages, images, etc from data in IndexDB, not to
just to hardcode aliases. (This is in the context of wanting to add and remove
URLs from the appcache, which would be easily implementable using a worker as
described above.)
================================================================================
 #15  michaeln@google.com                             2012-05-04 22:54:11 +0000 
--------------------------------------------------------------------------------
> Wouldn't it be way simpler to just load a defined fallback HTML document? For
> example, given the following appcache:
> 
> CACHE MANIFEST
> ALIAS:
> /x.html /serve-file.html
> /files/* /serve-file.html
> # serve-file.html is automatically included in the appcache

Chromium's appcache actually has a feature that's very close to whats described
here, with a slightly different syntax. The url in the first column is
considered a namespace prefix just like entries in the FALLBACK section.

CHROMIUM-INTERCEPT:
/Bugs/Public/show_bug.cgi?id= return /Bugs/Public/bug_shower_page.html

http://code.google.com/p/chromium/issues/detail?id=101565
http://codereview.chromium.org/8396013/

I dont think this addresses what this particular w3c issue is about.
================================================================================

-- 
Configure bugmail: https://www.w3.org/Bugs/Public/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.

Received on Wednesday, 18 July 2012 07:26:46 UTC