Re: "Packing on the Web" -- performance use cases / implications from Ilya Grigorik on 2015-01-14 (public-web-perf@w3.org from January 2015)

From: Ilya Grigorik <igrigorik@google.com>
Date: Wed, 14 Jan 2015 14:19:34 -0800
To: Alex Russell <slightlyoff@google.com>
Cc: Mark Nottingham <mnotting@akamai.com>, Yoav Weiss <yoav@yoav.ws>, public-web-perf <public-web-perf@w3.org>
Message-ID: <CADXXVKp0-RXM5pV3ETVs9hTXWs1KG_0tz8MbS-YsijZmAtDCTw@mail.gmail.com>
On Tue, Jan 13, 2015 at 3:35 PM, Alex Russell <slightlyoff@google.com>
wrote:

> On Tue, Jan 13, 2015 at 2:18 PM, Ilya Grigorik <igrigorik@google.com>
> wrote:
>
>> On Wed, Jan 7, 2015 at 8:25 AM, Mark Nottingham <mnotting@akamai.com>
>>  wrote:
>>
>>> This doc:
>>>   http://w3ctag.github.io/packaging-on-the-web/
>>> says a number of things that about how a Web packaging format could
>>> improve Web performance; e.g., for cache population, bundling packages to
>>> distribute to servers, etc.
>>>
>>
>> tl;dr: I think its introducing perf anti-patterns and is going against
>> the general direction we want developers to head. Transport optimization
>> should be left at transport layer and we already have much better
>> (available today!) solutions for this.
>>
>
> I'm going to leave comments inline below, but I think your read of this is
> far too harsh, forecloses meaningful opportunities for developers and UAs,
> and in general isn't trying to be as collaborative as I think those of us
> who have worked on the design would hope for.
>

Apologies if it came across as overly negative. Mark asked for perf-related
feedback and that's what I'm trying to provide.. much of which I've shared
previously in other threads and chats. I do think there are interesting use
cases here that are worth resolving, but I'm just not convinced that a new
package streaming format is the right approach: lots of potential pitfalls,
duplicated functionality, etc. My comments shouldn't rule out use cases
which are not perf sensitive, but I do think it's worth considering the
perf implications for cases where it may end up being (ab)used.


> ---- some notes as I'm reading through the latest draft:
>>
>> (a) It's not clear to me how packages are updated after the initial
>> fetch. In 2.1.1. you download the .pack with a CSS file but then request
>> the CSS independently later... But what about the .pack? Wouldn't the
>> browser revalidate it, detect that the package has changed (since CSS has
>> been updated), and be forced to download the entire bundle once over? Now
>> we have duplicate downloads on top of unnecessary fetches.
>>
>
> The presence of the package file is a hint. It's designed to be compatible
> with legacy UAs which may issue requests for each resource, which the UA is
> *absolutely allowed to do in this case*. It can implement whatever
> heuristic or fetch is best.
>

That doesn't address my question though. How does my app rev the package
and take advantage of granular downloads, without incurring unnecessary
fetches and duplicate bytes? I'm with you on heuristics.. I guess I'm
asking for some documented examples of how this could/should work:

a) disregard packages: what we have today.. granular downloads and caching,
but some queuing limitations with http/1.
b) always fetch packages: you incur unnecessary bytes and fetches whenever
a single resource is updated.
c) how do I combine packages and granular updates? Wouldn't you always
incur unnecessary and/or duplicate downloads?

In general, all bundling strategies suffer from one huge flaw: a single
>> byte update in any of its subresources forces a full fetch of the entire
>> file.
>>
> Assuming, as you mistakenly have, that fetching the package is the only
> way to address the resource.
>

I didn't assume that it is, I understand that the proposed method is
"backwards compatible" and that UA can request granular updates for
updating resources.. but this takes us back to the previous point -- is
this only useful for the initial fetch? I'd love to see a good walkthrough
of how the initial fetch + granular update cycle would work here.


> (b) Packages introduce another HoL bottleneck: spec talks about ordering
>> recommendations, but there is still a strict ordering during delivery (e.g.
>> if the package is not a static resource then a single slow resource blocks
>> delivery of all resources behind it).
>>
>
> Is the critique -- seriously -- that doing dumb things is dumb?
>

I'm questioning why we would be enabling features that have all of the
highlighted pitfalls, while we have an existing solution that doesn't
suffer from the same issues. That, and I'm wondering if we can meet the
desired use cases without introducing these gotchas -- e.g. do we need the
streaming package at all vs. some form of manifest~like thing that defers
fetching optimizations to the transport layer.


> (c) Packages break granular prioritization:
>>
>
> Only assuming that your server doesn't do something smarter.
>
> One of the great things about these packages is that they can *cooperate* with
> HTTP/2: you can pre-fill caches with granular resources and entirely avoid
> serving packages to clients that are savvy to them.
>

Can you elaborate on the full end-to-end flow of how this would work:
initial package fetch for prefill, followed by...?

Would the UA unpack all the resources from a package into individual cache
entries? Does it retain the package file itself? What's the process for
revalidating a package? Or is that a moot question given that everything is
unpacked and the package itself is not retained? But then, how does the UA
know when to refetch the package?

As an aside: cache prefill is definitely an interesting use case and comes
with lots of gotchas... With http/2 we have the push strategy and the
client has ability to disable it entirely; opt-out from specific pushed
resources (send a RST on any stream - e.g. already in cache); control how
much is pushed (via initial flow window)... because we had a lot of
concerns over servers pushing a lot of unnecessary content and eating up
users BW/data. With packages the UA can only make a binary decision of
fetch or no fetch, which is a lot less flexible.


> Your server can even consume packages as an ordered set of resources to
> prioritize the sending of (and respond with no-op packages to clients for
> which the package wouldn't be useful).
>

Does this offer anything extra over simply delivering individual resources
with granular caching and prioritization available in http/2?

>From what I can tell, the primary feature is that the client doesn't
necessarily know what all the resources it may need to download are... For
which we have two solutions: http/2 push, or we teach the client to learn
what those resource URIs are and initiate the requests from the client
(regardless of http version).

ig
Received on Wednesday, 14 January 2015 22:20:41 UTC