Re: Packaging on the Web from Alex Russell on 2014-02-03 (www-tag@w3.org from February 2014)

From: Alex Russell <slightlyoff@google.com>
Date: Sun, 2 Feb 2014 17:12:26 -0800
To: Jeni Tennison <jeni@jenitennison.com>
Cc: "www-tag@w3.org List" <www-tag@w3.org>
Message-ID: <CANr5HFW1odhE4BCyHfUrAcGfq+56zET4+KssbTbCPK9c=7NmQQ@mail.gmail.com>
On Sun, Feb 2, 2014 at 9:36 AM, Jeni Tennison <jeni@jenitennison.com> wrote:

> Alex,
>
> > First, thanks for capturing what seems to be broad consensus
> > on the packaging format (multi-part mime). Seems great!
>
> I tried to capture the rationale for the multipart type for packaging. The
> one massive disadvantage as far as I’m concerned is the necessity for the
> boundary parameter in the content type.


It seems a new content-type is needed for security anyhow, no?


> A new type that had the same syntax as a multipart type but had a
> sniffable boundary (ie started with --boundary) might be better than using
> a multipart/* content type.


ISTM that we have a chance to repair that if we wish. New tools will be
needed to create packages of this type in any case.


> > I'm intrigued by the way you're handling base URL resolution for relative
> > URLs. Do you imagine that base URL metadata will be required inside
> > packages? And if you move a package off-origin, but it is CORS-fetched,
> > does that enable a third-party to "front" for a second-party origin? How
> > does the serving URL get matched/merged with the embedded base
> > URL? And if the base URL metadata isn't required, what happens?
>
> Good questions. I wasn’t imagining the base URL would be required inside
> packages, but would be taken as the location from which the package was
> fetched.
>

I see. I think I got confused by the phrase:

Content from the cache will run with a base URL supplied within the package.


This, then, would be the the locations from which the package was fetched?


> Since the Content-Location URLs have to be absolute-path-relative or
> path-relative (ie can’t contain a domain name), you can’t get content from
> one origin pretending to be from another origin. Obviously that means if
> you host a package you have to be careful about what it contains, but
> that’s true of practically any web content.


Makes a lot more sense. Thanks!


>  > I'm curious about the use of fragments. Yehdua covered this pretty
> > thoroughly in the constraints he previously outlined when we
> > went over this in Boston:
> >
> > https://gist.github.com/wycats/220039304b053b3eedd0
> >
> > Fragments aren't sent to the server and so don't have any meaningful
> > server-based fallback or service-worker polyfill story. That
> > seems pretty fundamental. Is there something in the URL format proposal
> that
> > I'm missing?
>
> I’m not sure; it depends what you’re curious about. My assumption is that,
> for backwards compatibility with clients that don’t understand packages,
> the files in a package would all be accessible from the server directly as
> well as through the package. In other words, if a package at
> `/package.pack` contains `/index.html` and `/images/icon.png` then
> `/index.html` and `/images/icon.png` will also be available directly on the
> server.
>

I take it you're trying to avoid a world where I ever write something like:

  <img src="/a.pack#file=/icon.webp" ...>

And instead would recommend that webdevs write:

  <link rel="package" href="/a.pack">
  <!-- ... -->
  <img src="/icon.webp" ...>

Is that right?

If so, I think there are interactions with browser optimizations to
consider. It's common for browser engines to "pre scan" sections of
document streams before parsing to start requesting the resources they
contain. This is a big win when parsing might be held up by
<script>elements which could call
document.write()

A UA that has a request out for a.pack and sees a pre-scan request for
icon.webp isn't going to have much information about if/how it should fetch
icon.webp. The choices are (not exhaustively):

   - Wait on a.pack and hope it contains icon.webp (we could mandate an
   up-front metadata section in the package format to mitigate against this).
   If it doesn't, send the request. Given the low-probability that all page
   resources are likely to be in packages, even for pages that use them, this
   is a fraught option.
   - Pre-flight the request for icon.webp and race to see which one comes
   back first. It's unclear what we've won in this scenario, even for pages
   that are fully-packaged.

Given that set up, people who want to point to `#section` within
> `/index.html` can do so with the plain URL `/index.html#section`. There are
> no good reasons (so far as I can see) to create a link to something within
> the package itself when you can point directly to the thing you want to
> point to. Can you think of any?
>

Until now I (and perhaps others) have been laboring under the assumption
that packages would enable package de-references to be usable URLs for
content *in general* and that the fallback solution would be a time-limited
stop-gap*.* Your design turns that on its head.

One nice thing about packages-as-URLs is that membership is clear. The
pre-scanner can clearly know if a URL *should* be in a package for which
there is an outstanding request.

The polyfill story is that when `/index.html` was fetched, its content
> would be scanned and the package located. This would be used to populate
> the cache, using the Content-Location headers to work out the relevant URLs
> that were covered off by the package. On later fetches, the cache would be
> used for any requests to pages within it. Can you point me to the bit of
> that that doesn’t work?
>

Now that I understand the intent more fully, I think it does work. One
concern is that they can't be self-contained. E.g., I don't know how to put
an "entire site" inside a package. I can imagine how to put resources
referenced from some HTML, but not how to serve the HTML itself. Perhaps a
"root document" concept as Herbert suggested could work. Am I missing
something obvious there?
Received on Monday, 3 February 2014 01:13:24 UTC