[whatwg] HTML resource packages from Justin Lebar on 2010-08-10 (public-whatwg-archive@w3.org from August 2010)

From: Justin Lebar <justin.lebar@gmail.com>
Date: Mon, 9 Aug 2010 22:44:18 -0700
Message-ID: <AANLkTimW_b_12MuEOTyT-fPWhZntHmcy5MFczsHB-_gz@mail.gmail.com>
The files I used for the rough benchmarks are available in a tarball
at [1].  Live pages are at [2] and [3].

[1] http://people.mozilla.org/~jlebar/respkg/test/benchmark_files.tgz
[2] http://people.mozilla.org/~jlebar/respkg/test/test-pkg.html
[3] http://people.mozilla.org/~jlebar/respkg/test/test-nopkg.html

-Justin

On Mon, Aug 9, 2010 at 1:40 PM, Justin Lebar <justin.lebar at gmail.com> wrote:
>> Can you provide the content of the page which you used in your whitepaper?
>> (https://bug529208.bugzilla.mozilla.org/attachment.cgi?id=455820)
>
> I'll post this to the bug when I get home tonight. ?But your comments
> are astute -- the page I used is a pretty bad benchmark for a variety
> of reasons. ?It sounds like you probably could hack up a much better
> one.
>
>> ? ?a) Looks like pages were loaded exactly once, as per your notes? ?How
>> hard is it to run the tests long enough to get to a 95% confidence interval?
>
> Since I was running on a simulated network with no random parameters
> (e.g. no packet loss), there was very little variance in load time
> across runs.
>
>> ? ?d) What did you do about subdomains in the test? ?I assume your test
>> loaded from one subdomain?
>
> That's correct.
>
>> I'm betting time-to-paint goes through the roof with resource bundles:-)
>
> It does right now because we don't support incremental extraction,
> which is why I didn't bother measuring time-to-paint. ?The hope is
> that with incremental extraction, we won't take too much of a hit.
>
> -Justin
>
> On Mon, Aug 9, 2010 at 1:30 PM, Mike Belshe <mike at belshe.com> wrote:
>> Justin -
>> Can you provide the content of the page which you used in your whitepaper?
>> (https://bug529208.bugzilla.mozilla.org/attachment.cgi?id=455820)
>> I have a few concerns about the benchmark:
>> ?? a) Looks like pages were loaded exactly once, as per your notes? ?How
>> hard is it to run the tests long enough to get to a 95% confidence interval?
>> ?? b) As you note in the report, slow start will kill you. ?I've verified
>> this so many times it makes me sick. ?If you try more combinations, I
>> believe you'll see this.
>> ?? c) The 1.3MB of subresources in a single bundle seems unrealistic to me.
>> ?On one hand you say that its similar to CNN, but note that CNN has
>> JS/CSS/images, not just thumbnails like your test. ?Further, note that CNN
>> pulls these resources from multiple domains; combining them into one domain
>> may work, but certainly makes the test content very different from CNN. ?So
>> the claim that it is somehow representative seems incorrect. ? For more
>> accurate data on what websites look like,
>> see?http://code.google.com/speed/articles/web-metrics.html
>> ?? d) What did you do about subdomains in the test? ?I assume your test
>> loaded from one subdomain?
>> ?? e) There is more to a browser than page-load-time. ?Time-to-first-paint
>> is critical as well. ?For instance, in WebKit and Chrome, we have specific
>> heuristics which optimize for time-to-render instead of total page load.
>> ?CNN is always cited as a "bad page", but it's really not - it just has a
>> lot of content, both below and above the fold. ?When the user can interact
>> with the page successfully, the user is happy. ?In other words, I know I can
>> make webkit's PLT much faster by removing a couple of throttles. ?But I also
>> know that doing so worsens the user experience by delaying the time to first
>> paint. ?So - is it possible to measure both times? ?I'm betting
>> time-to-paint goes through the roof with resource bundles:-)
>> If you provide the content, I'll try to?run some tests. ?It will take a few
>> days.
>> Mike
>>
>> On Mon, Aug 9, 2010 at 9:52 AM, Justin Lebar <justin.lebar at gmail.com> wrote:
>>>
>>> On Mon, Aug 9, 2010 at 9:47 AM, Aryeh Gregor <Simetrical+w3c at gmail.com>
>>> wrote:
>>> > If UAs can assume that files with the same path
>>> > are the same regardless of whether they came from a resource package
>>> > or which, and they have all but a couple of the files cached, they
>>> > could request those directly instead of from the resource package,
>>> > even if a resource package is specified.
>>>
>>> These kinds of heuristics are far beyond the scope of resource
>>> packages as we're planning to implement them. ?Again, I think this
>>> type of behavior is the domain of a large change to the networking
>>> stack, such as SPDY, not a small hack like resource packages.
>>>
>>> -Justin
>>>
>>> On Mon, Aug 9, 2010 at 9:47 AM, Aryeh Gregor <Simetrical+w3c at gmail.com>
>>> wrote:
>>> > On Fri, Aug 6, 2010 at 7:40 PM, Justin Lebar <justin.lebar at gmail.com>
>>> > wrote:
>>> >> I think this is a fair point. ?But I'd suggest we consider the
>>> >> following:
>>> >>
>>> >> * It might be confusing for resources from a resource package to show
>>> >> up on a page which doesn't "opt-in" to resource packages in general or
>>> >> to that specific resource package.
>>> >
>>> > Only if the resource package contains a different file from the real
>>> > one. ?I suggest we treat this as a pathological case and accept that
>>> > it will be broken and confusing -- or at least we consider how many
>>> > extra optimizations we could make if we did accept that, before
>>> > deciding whether the extra performance is worth the confusion.
>>> >
>>> >> * There's no easy way to opt out of this behavior. ?That is, if I
>>> >> explicitly *don't* want to load content cached from a resource
>>> >> package, I have to name that content differently.
>>> >
>>> > Why would you want that, if the files are the same anyway?
>>> >
>>> >> * The avatars-on-a-forum use case is less convincing the more I think
>>> >> about it. ?Certainly you'd want each page which displays many avatars
>>> >> to package up all the avatars into a single package. ?So you wouldn't
>>> >> benefit from the suggested caching changes on those pages.
>>> >
>>> > I don't see why not. ?If UAs can assume that files with the same path
>>> > are the same regardless of whether they came from a resource package
>>> > or which, and they have all but a couple of the files cached, they
>>> > could request those directly instead of from the resource package,
>>> > even if a resource package is specified. ?So if twenty different
>>> > people post on the page, and you've been browsing for a while and have
>>> > eighteen of their avatars (this will be common, a handful of people
>>> > tend to account for most posts in a given forum):
>>> >
>>> > 1) With no resource packages, you fetch two separate avatars (but on
>>> > earlier page views you suffered).
>>> >
>>> > 2) With resource packages as you suggest, you fetch a whole resource
>>> > package, 90% of which you don't need. ?In fact, you have to fetch a
>>> > resource package even if you have 100% of the avatars on the page! ?No
>>> > two pages will be likely to have the same resource package, so you
>>> > can't share cache at all.
>>> >
>>> > 3) With resource packages as I suggest, you fetch only two separate
>>> > avatars, *and* you got the benefits of resource packages on earlier
>>> > pages. ?The UA gets to guess whether using resource packages would be
>>> > a win on a case-by-case basis, so in particular, it should be able to
>>> > perform strictly better than either (1) or (2), given decent
>>> > heuristics. ?E.g., the heuristic "fetch the resource package if I need
>>> > at least two files, fetch the file if I only need one" will perform
>>> > better than either (1) or (2) in any reasonable circumstance.
>>> >
>>> > I think this sort of situation will be fairly common. ?Has anyone
>>> > looked at a bunch of different types of web pages and done a breakdown
>>> > of how many assets they have, and how they're reused across pages? ?If
>>> > we're talking about assets that are used only on one page (image
>>> > search) or all pages (logos, shared scripts), your approach works
>>> > fine, but not if they're used on a random mix of pages. ?I think a lot
>>> > of files will wind up being used on only particular subsets of pages.
>>> >
>>> >> In general, I think we need something like SPDY to really address the
>>> >> problem of duplicated downloads. ?I don't think resource packages can
>>> >> fix it with any caching policy.
>>> >
>>> > Certainly there are limits to what resource packages can do, but we
>>> > can wind up closer to the limits or farther from them depending on the
>>> > implementation details.
>>> >
>>
>>
>
Received on Monday, 9 August 2010 22:44:18 UTC