RE: Resource Timing - What's included from Nic Jansma on 2011-05-31 (public-web-perf@w3.org from May 2011)

From: Nic Jansma <Nic.Jansma@microsoft.com>
Date: Tue, 31 May 2011 19:53:40 +0000
To: Kyle Simpson <getify@gmail.com>, Zhiheng Wang <zhihengw@google.com>
CC: "public-web-perf@w3.org" <public-web-perf@w3.org>
Message-ID: <F677C405AAD11B45963EEAE5202813BD19E03B80@TK5EX14MBXW651.wingroup.windeploy.ntde>
Hi Kyle,

>> Our Resource Timing interface is scoped to simply be a DOM interface. 
>> Even though we (as web developers) would be able to benefit from 
>> knowing the cache status of our visitors -- there are many useful 
>> scenarios that could enable -- we have to respect the user's privacy concerns.
>
> I still don't understand why, as I've asked a number of times through
> various threads on this and similar discussions, we can't expose the more
> sensitive data only to script resources which originated from the same domain
> as the page, as opposed to third-party resources?

Would you be able to clarify which additional data you're thinking specifically?

Also, as a clarification, which of these are you proposing:
1) Expose additional sensitive data for all resources that originate from the same domain as the current document
2) Expose additional sensitive data for all resources, regardless of their origin, to scripts that originate from the same domain as the current document.

The spec currently takes the path of #1, for example with same-origin exposing connectStart, and these rules can be relaxed via Timing-Allow-Origin.  Would there be additional sensitive data that you think is useful?
#2 is what we're specifically avoiding as it has privacy concerns, though I couldn't tell if this is what you were proposing.

> I'm saying that any DOM element which refers to an external resource be included, even
> if that DOM element is referring to an resource that has already begun it's request by being
> asked for earlier in the DOM. So, all <img>, <object>, <script>, <link> and any other such elements,
> if they have a src/href attribute specified, should show up in the resource timing array,
> even if that means there are strictly duplicates listed in terms of the actual URL for the resources.
>
> A subsequent <img> or <script> tag referring to a duplicate URL still affects the way the page
> is constructed/rendered, even if it's actual networking-layer effects are minimal or zero. We should
> be able to capture ALL information about external resources that affect the page. Any list of
>resources that is incomplete is a failure (IMHO) of this spec.

If we were to include all <img> tags in the array, even if the src of an <img> was already being download for a previously identical <img> tag, I don't see the benefit of including it in the array?  Its network latency timing data would be 0, so there would be no additionally useful information.  These duplicate <img> tags are not included in Net panels of dev tools, for example.

I understand your desire to include additional non-network-timing information to the ResourceTiming interface, but that is not *currently* within the scope of the spec.  My concerns about attempting to add additional "how long did it take to load" information is that it significantly explodes the scope of the spec and leads us into territory that may be hard to define.

Take, for example, IMGs.  My understanding of your proposal is that you might want to add "loadStart" and "loadEnd" timestamps.  Does this include time to decode the image?  Create textures on the video card?  Send the bits to the display?  Is loadEnd the time that the image is actually visible to the user?  All of these phases would have to be well-defined in order for a user agent to properly implement the timestamp.

What about CSS?  Would the loadStart/loadEnd times include the time it took to parse the CSS?  Apply the stylesheets to elements?

We'd have to go down this path for each element, and try to precisely define what the non-network-times mean.  And they can all vary by how the user agent implements the element.  While I see the benefit of trying to expose these timings to developers, I'm not sure that we would ever be able to get a well-defined, user-agent agnostic definition of what these phases would be.

The current ResourceTiming spec sticks only to network latencies, and even it is pretty complex.  If you have thoughts on how we could tackle some of these other useful timings, maybe as part of another spec or interface, that would be great.

- Nic


-----Original Message-----
From: Kyle Simpson [mailto:getify@gmail.com] 
Sent: Monday, May 30, 2011 8:36 AM
To: Zhiheng Wang; Nic Jansma
Cc: public-web-perf@w3.org
Subject: Re: Resource Timing - What's included

> Our Resource Timing interface is scoped to simply be a DOM interface. 
> Even though we (as web developers) would be able to benefit from 
> knowing the cache status of our visitors -- there are many useful 
> scenarios that could enable -- we have to respect the user's privacy concerns.

I still don't understand why, as I've asked a number of times through various threads on this and similar discussions, we can't expose the more sensitive data only to script resources which originated from the same domain as the page, as opposed to third-party resources?



> I agree with your desire to have timing information for all resources, 
> but I don't think it's in the scope of what we're trying to achieve 
> for Resource Timing.  For Resource Timing, we are essentially trying 
> to mimic the information available from Network panels in developer tools.
> ...
> Including "timing" information for *all* resources (whatever that is 
> defined is) is a much larger, and more challenging issue.
> ...
> With your example of multiple IMG tags referring to the same resource, 
> you would be suggesting that all IMGs on the page would be included in 
> the RT array.  Additionally, SCRIPT elements (even inline ones) would 
> be included, because they block parsing progress.  Continuing down 
> that line of thought, TABLEs (large ones take a while to parse) and 
> SVGs (even
> inline) would be included, etc.  It becomes a much larger and more 
> complex spec at that point.

No, that's not what I'm suggesting at all. I'm not suggesting that any "large" DOM element be exposed in the timing array. I'm saying that any DOM element which refers to an external resource be included, even if that DOM element is referring to an resource that has already begun it's request by being asked for earlier in the DOM. So, all <img>, <object>, <script>, <link> and any other such elements, if they have a src/href attribute specified, should show up in the resource timing array, even if that means there are strictly duplicates listed in terms of the actual URL for the resources.

A subsequent <img> or <script> tag referring to a duplicate URL still affects the way the page is constructed/rendered, even if it's actual networking-layer effects are minimal or zero. We should be able to capture ALL information about external resources that affect the page. Any list of resources that is incomplete is a failure (IMHO) of this spec.


> Additionally, we are proposing only exposing new, previously 
> unavailable information to the user -- the network latencies.  The 
> in-browser timing delays you mention are all somewhat measurable today via JavaScript.

"somewhat measurable" is about as useful as "not measureable in any reliable way at all". JavaScript timers are horribly inaccurate with such sub-millisecond type actions, which is why it would be extremely useful for accurate timing to be exposed via this interface.


--Kyle
Received on Tuesday, 31 May 2011 19:54:17 UTC