- From: Kyle Simpson <getify@gmail.com>
- Date: Wed, 23 Mar 2011 12:59:04 -0500
- To: "Nic Jansma" <Nic.Jansma@microsoft.com>, <public-web-perf@w3.org>
> 1) Resources that are already in the browser's disk cache, for example,
> from loading the page yesterday: *would* be included in the RT arrays.
> Examples id="4" and id="5" below show this.
Agreed, these definitely need to be included. Will there be some flag that
indicates where it came from ("cache", "network", etc)? I think there
definitely should be.
> 2) Resources that are referred to multiple times in the same page ("1a",
> "1b", "1c"): Our current thoughts are that these resources *should not*
> be included, as my understanding is that all current browsers optimize
> this case and do not initiate network requests for duplicate resource URIs
> (eg, for "1b" and "1c", the browser wouldn't get 1.jpg again).
> Additionally, these types of resources are not shown in browsers' Net
> panels, or from a network sniffer.
Actually, I think this assumption is not entirely correct. I have a
JavaScript loader called LABjs, and in some browsers it operates in a "cache
preloading" hacky method, where it makes a request for a script using a
method that is guaranteed to download but *not* execute it (either by using
a fake mime-type, or by using a <object> or Image container). Then, when
appropriate, a second proper script element request is made for the same URL
resource, making the assumption of course that the previous request
successfully cached it. This second request being from a proper
container/type, of course it then executes.
But, the point is, in that scenario, in almost all browsers, I see both
requests logged (IE9, Firefox, Chrome, etc). That's because the browser will
still have to pull that second request from the browser cache.
So, the fine point distinction you must be making is, did the second
"duplicate" request happen to overlap while the first one was still loading
or not. If it overlaps, I guess you're saying that browsers don't make a
second/duplicate request. But if the two don't overlap, as in the "cache
preload" technique I describe, then the browser and tools clearly do
indicate a second load, even though it's "on the same page".
So, from a consistency standpoint, there being sort of a race condition as
to when the second request gets initiated, I think it'd be a bad idea not to
include *all* resource requests, as sometimes the list would have those
"duplicates" and sometimes it wouldn't, which will seem like
non-deterministic/race-condition'y behavior to most who don't know all the
finer details.
Also, even if the browser does have short-circuit logic in it to not make
the second request, AFAIK, the browser still must make some base assumptions
about the contents of the resource, which can affect the browser's timing
behavior for other actions.
For instance, if I have 3 script elements spread across my DOM, all asking
for the same script resource, the browser has to assume that the resource
may have a `document.write()` in it, in which case it has to "block"
everything else in the DOM/page from rendering, for *each* script element,
until it runs that script. So, even though only one actual "request" may
have gone beyond the HTML parser layer, the presence of 3 requesting
containers is valuable information that in fact affects timings.
> I could see including them for other reasons -- completeness of listing
> all of the resources, whether or not they were retrieved from the network.
> But to me, it seems like listing every resource on the page would include
> a lot of redundant data, many of them without network latencies.
Perhaps the reason for my slightly different thinking on this is because I'm
assuming that "Resource Timing" means more than just "network request layer
timing", but the overall full timing of end-to-end requesting a resource
through to when the "request" for that resource is fulfilled. In that
broader definition, even duplicate requests "cost" some time, and should
therefore be logged/accounted for in some respect.
> I would agree with you that the HTTP status code of the resource should
> not exclude it from the RT array. 404/500/etc should all be included. If
> the browser "initiates" a request, whether or not it was completed, we
> should include it in the array.
Yes, and furthermore, I think (mostly for data filtering purposes) having
the status code actually in the data structure would be important. For
instance, if a tool analyzing this data wants to filter out all 404's, etc.
Same goes for, as mentioned above, having an indicator of where the resource
came from ("network", "cache", "cache-revalidated", "cache-duplicate", etc).
--Kyle
Received on Wednesday, 23 March 2011 17:59:45 UTC