Re: Resource Timing - What's included from Kyle Simpson on 2011-05-31 (public-web-perf@w3.org from May 2011)

From: Kyle Simpson <getify@gmail.com>
Date: Tue, 31 May 2011 15:39:58 -0500
To: "Nic Jansma" <Nic.Jansma@microsoft.com>, "Zhiheng Wang" <zhihengw@google.com>
Cc: <public-web-perf@w3.org>
Message-ID: <D642E00C0637418AA126EF74872C8737@spartacus>
>>> Our Resource Timing interface is scoped to simply be a DOM interface.
>>> Even though we (as web developers) would be able to benefit from
>>> knowing the cache status of our visitors -- there are many useful
>>> scenarios that could enable -- we have to respect the user's privacy 
>>> concerns.
>>
>> I still don't understand why, as I've asked a number of times through
>> various threads on this and similar discussions, we can't expose the more
>> sensitive data only to script resources which originated from the same 
>> domain
>> as the page, as opposed to third-party resources?
>
> Would you be able to clarify which additional data you're thinking 
> specifically?

Well, I'd have to go and dig up some old message threads here to be complete 
in my answer. But I know we've talked about things like knowing the exact 
status code of a response as being something that's sensitive to privacy 
concerns. So what I mean is information like that. If you need me to clarify 
other examples of such data, I'll be glad to go digging through the archive.


> Also, as a clarification, which of these are you proposing:
> 1) Expose additional sensitive data for all resources that originate from 
> the same domain as the current document
> 2) Expose additional sensitive data for all resources, regardless of their 
> origin, to scripts that originate from the same domain as the current 
> document.

I'm specifically suggesting #2.


> #2 is what we're specifically avoiding as it has privacy concerns, though 
> I couldn't tell if this is what you were proposing.

So, the privacy concern is if a user visits a evil.com site, and that site 
tries to load say an image from facebook.com to see if the user has visited 
facebook or not?

I was more thinking of protecting a valid site (using such data in valid and 
reputable ways) from leaking information to a third-party script that it 
also loaded.


>> I'm saying that any DOM element which refers to an external resource be 
>> included, even
>> if that DOM element is referring to an resource that has already begun 
>> it's request by being
>> asked for earlier in the DOM. So, all <img>, <object>, <script>, <link> 
>> and any other such elements,
>> if they have a src/href attribute specified, should show up in the 
>> resource timing array,
>> even if that means there are strictly duplicates listed in terms of the 
>> actual URL for the resources.
>>
>> A subsequent <img> or <script> tag referring to a duplicate URL still 
>> affects the way the page
>> is constructed/rendered, even if it's actual networking-layer effects are 
>> minimal or zero. We should
>> be able to capture ALL information about external resources that affect 
>> the page. Any list of
>>resources that is incomplete is a failure (IMHO) of this spec.
>
> If we were to include all <img> tags in the array, even if the src of an 
> <img> was already being download for a previously identical <img> tag, I 
> don't see the benefit of including it in the array?  Its network latency 
> timing data would be 0, so there would be no additionally useful 
> information.  These duplicate <img> tags are not included in Net panels of 
> dev tools, for example.

It's precisely the fact that they aren't in the Network tab that makes the 
Network tab not particularly useful in determining how external resources 
affected the DOM parsing and rendering.


> I understand your desire to include additional non-network-timing 
> information to the ResourceTiming interface, but that is not *currently* 
> within the scope of the spec.  My concerns about attempting to add 
> additional "how long did it take to load" information is that it 
> significantly explodes the scope of the spec and leads us into territory 
> that may be hard to define.

Well, that type of information would certainly be useful, and if in the 
future we wanted to tackle adding it, then certainly it would warrant having 
all those container entries in the array, right? I'd submit at least for 
consideration that changing in the future what elements are in the array (as 
opposed to just what data properties are on each element) is a more jarring 
change that could negatively impact tooling... so if such data is ever a 
future candidate for inclusion, perhaps having the placeholder elements in 
the array isn't a terrible future-proofing step.

BUT, that's not actually what I'm hoping for in this pass. What I was hoping 
is, the list of external resources, as an ordered array, gives sort of a 
play-by-play of how resource loadings were encountered as a page was 
proceeding to parse and render. It's kinda like in baseball, how all plays, 
errors, etc are all recorded, so that you can reconstruct a play-back of 
exactly how the game proceeded.

Since things like a <link rel=stylesheet> placed strategically right after a 
<script> (regardless of if that stylesheet resource is already loading or 
not) affects how the timing of how the page was rendered (the <script> will 
block in that case), then it makes sense to include it in the array at that 
exact position, so that such positional traps can be detected. It doesn't 
matter in this case that the network timing would be 0 for a duplicate 
reference to a stylesheet, what matters I that a reference to an external 
stylesheet happened in a particular position as to affect the timing of the 
processing of the page.

Bottom line, I'd like to be able to see an ordered list of every external 
resource container "request" in the page, even if some of those requests 
simply loaded an element from the cache, or some other of those elements 
simply just piggybacked on a previous container's request.


--Kyle
Received on Tuesday, 31 May 2011 20:40:35 UTC