- From: Daniel Weck <daniel.weck@gmail.com>
- Date: Wed, 6 Jan 2016 12:43:27 +0000
- To: Ivan Herman <ivan@w3.org>
- Cc: Brady Duga <duga@google.com>, Dave Cramer <Dave.Cramer@hbgusa.com>, Leonard Rosenthol <lrosenth@adobe.com>, Nick Ruffilo <nickruffilo@gmail.com>, Tzviya Siegman <tsiegman@wiley.com>, Charles LaPierre <charlesl@benetech.org>, W3C Digital Publishing IG <public-digipub-ig@w3.org>
Ivan, for security reasons: HTTPS is required, as well as a URL "scope" within the *same* domain / origin as the Service Worker script (by default, it's the location of the SW script itself, but that can be configured to a different path on the server). In other words, a SW script can only intercept (and therefore respond) to URL requests that conform to these restrictions. To illustrate this principle, here is a basic Service Workers usage example (the script caches resources as they are being requested, to allow for subsequent fast cache fetches instead of "real" HTTPS connections): 1) web browser opens chapter1.html ( e.g. https://server.com/pwp1/contents/chapter1.html ) To simplify, let's assume that there is an active Service Worker for this page, with a top-level scope: https://server.com/service_worker.js 2) web browser processes <img src="../images/logo.png" /> 3) web browser resolves image relative path against HTML document base href, resulting in e.g. https://server.com/pwp1/images/logo.png (note that base@href could potentially be overridden in the HTML head) 4) because the image URL is within the registered Service Worker scope: SW script intercepts image request via the fetch event listener, fetches and caches the image file if necessary (or updates the cache with a fresh resource), and generates the appropriate response (binary payload, content type, etc.). 5) web browser receives logo.png from the cache instead of from the actual HTTPS location. As you can see, this is a basic "on-demand" processing flow: no attempt is made to proactively cache resources that have yet to actually be requested by the web brower (i.e. chapters of the PWP that have not been accessed yet). Jake Archibald's "ebook demo" *reader* process goes one step further from a design standpoint, by preemptively caching all the resource URLs from a zip file (i.e. a publication archive that is previously created by a separate *publisher* process). This way, the HTTPS URLs that are requested when reading the publication chapters are totally *non-existent* on the server, yet the responses are resolved by fetching actual content from the cache. See: https://github.com/jakearchibald/ebook-demo/blob/gh-pages/reader-site/sw.js Jake's *publisher* process is also implemented with Service Workers (although it could alternatively be pure server-side code), and the goal is to intercept a particular URL syntax (i.e. 'fetch' event listener on "/download-publication" path) in order to build a zip archive response that contains the *entire* publication, as defined in "pub-manifest.json" (i.e. list of resource URLs, any files that are not deemed "external" to the publication): https://github.com/jakearchibald/ebook-demo/blob/gh-pages/publisher-site/sw.js The Readium Service Workers experiment does not use an intermediary browser cache to fetch all resources at once from within a zip EPUB archive. Instead, the resource requests are intercepted as they occur, and content is extracted / inflated on-demand. The common denominator with Jake's experiment is that publication resource URLs do not actually map to existing files on the server: they just reference the same HTTPS domain as the SW script itself (within the permitted scope), and the Service Worker takes care of building the corresponding payloads (either from the browser cache, or directly from the EPUB archive). In both cases, some URL syntax "trickery" (path convention) is used to map a full URL request to a resource within the exploded cache, or the zipped EPUB. I hope this helps clarifying possible SW usages (of which there are many). Dan On Wed, Jan 6, 2016 at 10:28 AM, Ivan Herman <ivan@w3.org> wrote: > Hi Daniel, > >> On 6 Jan 2016, at 11:16, Daniel Weck <daniel.weck@gmail.com> wrote: >> >> Hi Brady, >> Service Workers can intercept resource requests via "fetch" event >> listeners, as long as the URLs originate from within the permitted >> scope (which is itself an HTTPS URL). So in fact, intercepting >> requests to "external" resources is not possible (i.e. different >> domain, or even just URL path outside of the registered scope). Note >> that the "fetch" *API* (not the event type) can of course be used to >> programmatically emit requests to resources hosted on different >> domains (via HTTP CORS, just like XmlHTTPRequest), and this can indeed >> be used to populate a cache, or to build a PWP / EPUB zipped package >> based on some predefined manifest (i.e. list of well-identified >> publication resources). > > Just for my understanding: does it meant that, for a specific PWP, the (SW based) RS has to 'register' a number of domains or URL-s in its scope in order to be able to catch the requests and cache the content? If so then, in practice, we are close to the idea that a GET to a PWP should return (some form of) a manifest with the resources the PWP contains which should then be "registered" by the RS. > > What bothers me a bit is that, in [1], it talks about *a* 'scope URL'. Does it mean that, by default, the URL-s that are used by a PWP should all be under the same, fixed scope, and we must have a redirection mechanism built in to provide an access to external resources (using the fetch API)? > > This does have to shape our thinking, if this is all true. > > Thanks > > Ivan > > [1] http://www.w3.org/TR/service-workers/#dfn-scope-url > >> >> References: >> >> http://www.w3.org/TR/service-workers/#dfn-scope-url >> >> https://github.com/jakearchibald/ebook-demo/blob/gh-pages/publisher-site/sw.js >> >> https://github.com/jakearchibald/ebook-demo/blob/gh-pages/reader-site/sw.js >> >> Regards, >> Daniel >> >> On Tue, Jan 5, 2016 at 4:39 PM, Brady Duga <duga@google.com> wrote: >>> One thing to note regarding service workers - while they can be used to >>> cache in this simple case of an image on a different server, I don't think >>> they could be used in a more complicated case where resources identify other >>> resources. So, if you make a page of your publication be >>> http://louvre.com/monalisa.html, which in turn references >>> http://louvre.com/monalisa.jpg I don't think it is possible to cache the >>> image. Though, I am not an expert on service workers, so my understanding >>> could be flawed. >>> >>> On Tue, Jan 5, 2016 at 7:44 AM, Ivan Herman <ivan@w3.org> wrote: >>>> >>>> I think the goal should be somewhere in the middle. I agree that the >>>> definition of PWP should be, as much as possible, implementation agnostic, >>>> but I agree with Dave that saying "we don't care" is also not appropriate. >>>> >>>> We may have to define a PWP Processor in the abstract sense. What a >>>> processor is supposed to do to answer to different use cases, what are its >>>> functionalities, that sort of things. We may not define it in a normative >>>> way in the sense of some formal language or terminology, but we have to >>>> understand what can, cannot, should, or should not be done with a PWP. And >>>> it is certainly important to know whether the realization of such a PWP >>>> processor is possible with today's technologies, what is PWP specific and >>>> what can be reused off the shelf, etc. >>>> >>>> Ivan >>>> >>>> >>>> On 5 Jan 2016, at 16:24, Cramer, Dave <Dave.Cramer@hbgusa.com> wrote: >>>> >>>> On Jan 5, 2016, at 9:41 AM, Leonard Rosenthol <lrosenth@adobe.com> wrote: >>>> >>>> Nick – the specifics of how an RS chooses (or not) to cache are out of >>>> scope for PWP. They may make sense for some sort of format-specific work >>>> (eg. best practices for PWP with EPUB) but we don’t care about it here. >>>> >>>> Remember – PWP is format/packaging and implementation agnostic. (we >>>> seemed to all agree to that pre-holidays) >>>> >>>> >>>> The fact that an existing web technology can solve a critical use case for >>>> PWP is on-topic in my opinion, and learning about such things can only help >>>> our work. Such technologies may not be a part of the documents we produce, >>>> but saying "we don't care about it here" I think sends the wrong message. >>>> >>>> Dave >>>> This may contain confidential material. If you are not an intended >>>> recipient, please notify the sender, delete immediately, and understand that >>>> no disclosure or reliance on the information herein is permitted. Hachette >>>> Book Group may monitor email to and from our network. >>>> >>>> >>>> >>>> ---- >>>> Ivan Herman, W3C >>>> Digital Publishing Lead >>>> Home: http://www.w3.org/People/Ivan/ >>>> mobile: +31-641044153 >>>> ORCID ID: http://orcid.org/0000-0003-0782-2704 >>>> >>>> >>>> >>>> >>> > > > ---- > Ivan Herman, W3C > Digital Publishing Lead > Home: http://www.w3.org/People/Ivan/ > mobile: +31-641044153 > ORCID ID: http://orcid.org/0000-0003-0782-2704 > > > >
Received on Wednesday, 6 January 2016 12:44:17 UTC