W3C home > Mailing lists > Public > whatwg@whatwg.org > August 2010

[whatwg] HTML resource packages

From: Mike Belshe <mike@belshe.com>
Date: Mon, 9 Aug 2010 13:30:19 -0700
Message-ID: <AANLkTi=1Y0E3mw5n2WNmqAb7QRuigf8DOzFrc1qtP2pg@mail.gmail.com>
Justin -

Can you provide the content of the page which you used in your whitepaper? (
https://bug529208.bugzilla.mozilla.org/attachment.cgi?id=455820)

I have a few concerns about the benchmark:
   a) Looks like pages were loaded exactly once, as per your notes?  How
hard is it to run the tests long enough to get to a 95% confidence interval?
   b) As you note in the report, slow start will kill you.  I've verified
this so many times it makes me sick.  If you try more combinations, I
believe you'll see this.
   c) The 1.3MB of subresources in a single bundle seems unrealistic to me.
 On one hand you say that its similar to CNN, but note that CNN has
JS/CSS/images, not just thumbnails like your test.  Further, note that CNN
pulls these resources from multiple domains; combining them into one domain
may work, but certainly makes the test content very different from CNN.  So
the claim that it is somehow representative seems incorrect.   For more
accurate data on what websites look like, see
http://code.google.com/speed/articles/web-metrics.html
   d) What did you do about subdomains in the test?  I assume your test
loaded from one subdomain?
   e) There is more to a browser than page-load-time.  Time-to-first-paint
is critical as well.  For instance, in WebKit and Chrome, we have specific
heuristics which optimize for time-to-render instead of total page load.
 CNN is always cited as a "bad page", but it's really not - it just has a
lot of content, both below and above the fold.  When the user can interact
with the page successfully, the user is happy.  In other words, I know I can
make webkit's PLT much faster by removing a couple of throttles.  But I also
know that doing so worsens the user experience by delaying the time to first
paint.  So - is it possible to measure both times?  I'm betting
time-to-paint goes through the roof with resource bundles:-)

If you provide the content, I'll try to run some tests.  It will take a few
days.

Mike


On Mon, Aug 9, 2010 at 9:52 AM, Justin Lebar <justin.lebar at gmail.com> wrote:

> On Mon, Aug 9, 2010 at 9:47 AM, Aryeh Gregor <Simetrical+w3c at gmail.com<Simetrical%2Bw3c at gmail.com>>
> wrote:
> > If UAs can assume that files with the same path
> > are the same regardless of whether they came from a resource package
> > or which, and they have all but a couple of the files cached, they
> > could request those directly instead of from the resource package,
> > even if a resource package is specified.
>
> These kinds of heuristics are far beyond the scope of resource
> packages as we're planning to implement them.  Again, I think this
> type of behavior is the domain of a large change to the networking
> stack, such as SPDY, not a small hack like resource packages.
>
> -Justin
>
> On Mon, Aug 9, 2010 at 9:47 AM, Aryeh Gregor <Simetrical+w3c at gmail.com<Simetrical%2Bw3c at gmail.com>>
> wrote:
> > On Fri, Aug 6, 2010 at 7:40 PM, Justin Lebar <justin.lebar at gmail.com>
> wrote:
> >> I think this is a fair point.  But I'd suggest we consider the
> following:
> >>
> >> * It might be confusing for resources from a resource package to show
> >> up on a page which doesn't "opt-in" to resource packages in general or
> >> to that specific resource package.
> >
> > Only if the resource package contains a different file from the real
> > one.  I suggest we treat this as a pathological case and accept that
> > it will be broken and confusing -- or at least we consider how many
> > extra optimizations we could make if we did accept that, before
> > deciding whether the extra performance is worth the confusion.
> >
> >> * There's no easy way to opt out of this behavior.  That is, if I
> >> explicitly *don't* want to load content cached from a resource
> >> package, I have to name that content differently.
> >
> > Why would you want that, if the files are the same anyway?
> >
> >> * The avatars-on-a-forum use case is less convincing the more I think
> >> about it.  Certainly you'd want each page which displays many avatars
> >> to package up all the avatars into a single package.  So you wouldn't
> >> benefit from the suggested caching changes on those pages.
> >
> > I don't see why not.  If UAs can assume that files with the same path
> > are the same regardless of whether they came from a resource package
> > or which, and they have all but a couple of the files cached, they
> > could request those directly instead of from the resource package,
> > even if a resource package is specified.  So if twenty different
> > people post on the page, and you've been browsing for a while and have
> > eighteen of their avatars (this will be common, a handful of people
> > tend to account for most posts in a given forum):
> >
> > 1) With no resource packages, you fetch two separate avatars (but on
> > earlier page views you suffered).
> >
> > 2) With resource packages as you suggest, you fetch a whole resource
> > package, 90% of which you don't need.  In fact, you have to fetch a
> > resource package even if you have 100% of the avatars on the page!  No
> > two pages will be likely to have the same resource package, so you
> > can't share cache at all.
> >
> > 3) With resource packages as I suggest, you fetch only two separate
> > avatars, *and* you got the benefits of resource packages on earlier
> > pages.  The UA gets to guess whether using resource packages would be
> > a win on a case-by-case basis, so in particular, it should be able to
> > perform strictly better than either (1) or (2), given decent
> > heuristics.  E.g., the heuristic "fetch the resource package if I need
> > at least two files, fetch the file if I only need one" will perform
> > better than either (1) or (2) in any reasonable circumstance.
> >
> > I think this sort of situation will be fairly common.  Has anyone
> > looked at a bunch of different types of web pages and done a breakdown
> > of how many assets they have, and how they're reused across pages?  If
> > we're talking about assets that are used only on one page (image
> > search) or all pages (logos, shared scripts), your approach works
> > fine, but not if they're used on a random mix of pages.  I think a lot
> > of files will wind up being used on only particular subsets of pages.
> >
> >> In general, I think we need something like SPDY to really address the
> >> problem of duplicated downloads.  I don't think resource packages can
> >> fix it with any caching policy.
> >
> > Certainly there are limits to what resource packages can do, but we
> > can wind up closer to the limits or farther from them depending on the
> > implementation details.
> >
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.whatwg.org/pipermail/whatwg-whatwg.org/attachments/20100809/92082dd8/attachment.htm>
Received on Monday, 9 August 2010 13:30:19 UTC

This archive was generated by hypermail 2.3.1 : Monday, 13 April 2015 23:08:59 UTC