Re: Resource Timing - What's included from Kyle Simpson on 2011-03-23 (public-web-perf@w3.org from March 2011)

From: Kyle Simpson <getify@gmail.com>
Date: Wed, 23 Mar 2011 12:59:04 -0500
To: "Nic Jansma" <Nic.Jansma@microsoft.com>, <public-web-perf@w3.org>
Message-ID: <888411C7F6B84913AEB197880075BCB0@spartacus>
> 1) Resources that are already in the browser's disk cache, for example, 
> from loading the page yesterday: *would* be included in the RT arrays. 
> Examples id="4" and id="5" below show this.

Agreed, these definitely need to be included. Will there be some flag that 
indicates where it came from ("cache", "network", etc)? I think there 
definitely should be.


> 2) Resources that are referred to multiple times in the same page ("1a", 
> "1b", "1c"):  Our current thoughts are that these resources *should not* 
> be included, as my understanding is that all current browsers optimize 
> this case and do not initiate network requests for duplicate resource URIs 
> (eg, for "1b" and "1c", the browser wouldn't get 1.jpg again). 
> Additionally, these types of resources are not shown in browsers' Net 
> panels, or from a network sniffer.

Actually, I think this assumption is not entirely correct. I have a 
JavaScript loader called LABjs, and in some browsers it operates in a "cache 
preloading" hacky method, where it makes a request for a script using a 
method that is guaranteed to download but *not* execute it (either by using 
a fake mime-type, or by using a <object> or Image container). Then, when 
appropriate, a second proper script element request is made for the same URL 
resource, making the assumption of course that the previous request 
successfully cached it. This second request being from a proper 
container/type, of course it then executes.

But, the point is, in that scenario, in almost all browsers, I see both 
requests logged (IE9, Firefox, Chrome, etc). That's because the browser will 
still have to pull that second request from the browser cache.

So, the fine point distinction you must be making is, did the second 
"duplicate" request happen to overlap while the first one was still loading 
or not. If it overlaps, I guess you're saying that browsers don't make a 
second/duplicate request. But if the two don't overlap, as in the "cache 
preload" technique I describe, then the browser and tools clearly do 
indicate a second load, even though it's "on the same page".

So, from a consistency standpoint, there being sort of a race condition as 
to when the second request gets initiated, I think it'd be a bad idea not to 
include *all* resource requests, as sometimes the list would have those 
"duplicates" and sometimes it wouldn't, which will seem like 
non-deterministic/race-condition'y behavior to most who don't know all the 
finer details.

Also, even if the browser does have short-circuit logic in it to not make 
the second request, AFAIK, the browser still must make some base assumptions 
about the contents of the resource, which can affect the browser's timing 
behavior for other actions.

For instance, if I have 3 script elements spread across my DOM, all asking 
for the same script resource, the browser has to assume that the resource 
may have a `document.write()` in it, in which case it has to "block" 
everything else in the DOM/page from rendering, for *each* script element, 
until it runs that script. So, even though only one actual "request" may 
have gone beyond the HTML parser layer, the presence of 3 requesting 
containers is valuable information that in fact affects timings.


> I could see including them for other reasons -- completeness of listing 
> all of the resources, whether or not they were retrieved from the network. 
> But to me, it seems like listing every resource on the page would include 
> a lot of redundant data, many of them without network latencies.

Perhaps the reason for my slightly different thinking on this is because I'm 
assuming that "Resource Timing" means more than just "network request layer 
timing", but the overall full timing of end-to-end requesting a resource 
through to when the "request" for that resource is fulfilled. In that 
broader definition, even duplicate requests "cost" some time, and should 
therefore be logged/accounted for in some respect.


> I would agree with you that the HTTP status code of the resource should 
> not exclude it from the RT array.  404/500/etc should all be included.  If 
> the browser "initiates" a request, whether or not it was completed, we 
> should include it in the array.

Yes, and furthermore, I think (mostly for data filtering purposes) having 
the status code actually in the data structure would be important. For 
instance, if a tool analyzing this data wants to filter out all 404's, etc. 
Same goes for, as mentioned above, having an indicator of where the resource 
came from ("network", "cache", "cache-revalidated", "cache-duplicate", etc).


--Kyle
Received on Wednesday, 23 March 2011 17:59:45 UTC