Re: Resource Timing - What's included from Zhiheng Wang on 2011-05-26 (public-web-perf@w3.org from May 2011)

From: Zhiheng Wang <zhihengw@google.com>
Date: Thu, 26 May 2011 14:29:12 -0700
To: Nic Jansma <Nic.Jansma@microsoft.com>
Cc: Kyle Simpson <getify@gmail.com>, "public-web-perf@w3.org" <public-web-perf@w3.org>
Message-ID: <BANLkTinUggGbcP=V1FyQzotVhbQKAoikTA@mail.gmail.com>
   Digging up this thread again because I still find the
current description about this particular issue needs
more clarification.

   "The PerformanceResourceTiming interface must include all resources
fetched from the networking layer
by the current browsing context. Resources that are retrieved from the user
agent's networking layer cache
must be included in the PerformanceResourceTiming interface."

   By "networking layer" here, we are referring to the HTTP layer as
described in
http://www.ietf.org/rfc/rfc2616.txt, correct? This should be made clear so
it differentiates itself with the network
layer and transport layer from the OSI
model<http://en.wikipedia.org/wiki/OSI_model>
.

  By removing "duplicated contents" and loadEventStart/End from RT, we are
more focusing on networking stuff.
I am not sure if this is a good direction since what we skip here could be
significant to the overall performance
as well, e.g., responseEnd of an iframe could be quite different from the
time it's loaded, while it's the later one that
blocks the overall page load.

cheers,
Zhiheng


On Wed, Mar 23, 2011 at 3:16 PM, Nic Jansma <Nic.Jansma@microsoft.com>wrote:

> >> The problem with explicitly exposing "cache" vs. "network" is that it
> >> precisely exposes privacy information about the page's visitor.
> >
> > Existing dev-tools in the browser have access to all this information and
> don't expose it to JS in a way that would
> > enable such an attack. So, I guess I should step back and ask, is there a
> way we could make this information available
> > to "secure channels" (like in-browser dev tools, add-ons, plugins, etc)
> but not to potentially malicious JavaScript?
> > If so, would that offer a possible relief from this attack vector?
>
> But developer tools are extensions on your machine, that you've given
> explicit permission to run (by way of installing), that may be exposing
> additional information such as cache status (and a lot more).
>
> Developer tools, add-ons and plug-ins all fall into the same bucket -- the
> user has to give explicit permission for them to run, and understands (or
> doesn't) the risk of doing so.
>
> Our Resource Timing interface is scoped to simply be a DOM interface.  Even
> though we (as web developers) would be able to benefit from knowing the
> cache status of our visitors -- there are many useful scenarios that could
> enable -- we have to respect the user's privacy concerns.
>
> You could construct additional dev tools or add-ons to gather the
> additional caching data -- it is a matter, at that point, of convincing the
> user to install them.
>
> As for having the DOM request this permission, we could take a look at the
> GeoLocation spec [1], for example, which can expose the end-user's geo
> location.  However, the browser is *required* to get explicit permission
> from the user that their location data can be shared, via UI or something.
>  I don't see a compelling argument (from a user's POV) for Resource Timing
> that the user would want to expose his browsing history to a site.
>
> > Even with static resources (like <img> or <script> tags in the page), the
> presence of multiple/duplicate containers that
> > "request" the same resource affects the timing of how the page is
> assembled/rendered. Like in the previous email,
> > where I said that multiple <script> elements for the same script will
> still still cause some "blocking" because the browser
> > has to assume there may be a document.write() in each, which will affect
> how the rest of the DOM is assembled/interpreted...
> > the timing impact of such should be quite obvious.
>
> I agree with your desire to have timing information for all resources, but
> I don't think it's in the scope of what we're trying to achieve for Resource
> Timing.  For Resource Timing, we are essentially trying to mimic the
> information available from Network panels in developer tools.  Additionally,
> we are proposing only exposing new, previously unavailable information to
> the user -- the network latencies.  The in-browser timing delays you mention
> are all somewhat measurable today via JavaScript.
>
> Including "timing" information for *all* resources (whatever that is
> defined is) is a much larger, and more challenging issue.
>
> With your example of multiple IMG tags referring to the same resource, you
> would be suggesting that all IMGs on the page would be included in the RT
> array.  Additionally, SCRIPT elements (even inline ones) would be included,
> because they block parsing progress.  Continuing down that line of thought,
> TABLEs (large ones take a while to parse) and SVGs (even inline) would be
> included, etc.  It becomes a much larger and more complex spec at that
> point.
>
> Also, to be clear, the existing loadEventStart and loadEventEnd times in
> the spec are only capturing the actual time to run any load event handlers
> attached to that element.  They do not include the time to
> decode/render/display and IMG, nor would it include the time a SCRIPT tag
> would be blocking parsing progress.
>
> For this spec, we are focusing on just exposing the previously unavailable
> network latencies.
>
> > Would it be possible to simply expose "success" or "failure" of a loaded
> item, as opposed to the exact HTTP Response code? In other words,
> 1xx/2xx/3xx codes are "success", and 4xx/5xx codes "failure".
> >
> > Also, same question as above, is it possible to devise a system by which
> untrusted JavaScript gets the more filtered/watered-down data (to mitigate
> attacks), but tooling/add-ons have access to the more full-fledged data
> stream?
>
> We talked about this in today's conference call.  Knowing 200 vs. 304 could
> be very valuable (for same-origin resources).  We will attempt to add the
> HTTP status code (in some fashion) into the next version of the spec.
>
> Thank for all of your input so far!
>
> [1] http://dev.w3.org/geo/api/spec-source.html
>
> - Nic
> -----Original Message-----
> From: Kyle Simpson [mailto:getify@gmail.com]
> Sent: Wednesday, March 23, 2011 12:44 PM
> To: Nic Jansma; public-web-perf@w3.org
> Subject: Re: Resource Timing - What's included
>
> > The problem with explicitly exposing "cache" vs. "network" is that it
> > precisely exposes privacy information about the page's visitor.
>
> Existing dev-tools in the browser have access to all this information and
> don't expose it to JS in a way that would enable such an attack. So, I guess
> I should step back and ask, is there a way we could make this information
> available to "secure channels" (like in-browser dev tools, add-ons, plugins,
> etc) but not to potentially malicious JavaScript? If so, would that offer a
> possible relief from this attack vector?
>
>
> > I think we're on the same page -- we both want RT to expose the
> "observed"
> > behavior of browsers.
> >
> > My example below was a simplification of the issue, and meant to point
> > out one optimization that I believe all current modern browsers
> implement.
> > For *static* elements within the page (e.g. <IMG />), current browsers
> > re-use prior duplicate resource URLs instead of downloading them twice.
> > From my simple HTML example, only one resource request for 1.jpg would
> > occur.  Current browsers don't re-check the cacheability of resource
> > within the *same page* for *static* resources.
>
> Even with static resources (like <img> or <script> tags in the page), the
> presence of multiple/duplicate containers that "request" the same resource
> affects the timing of how the page is assembled/rendered. Like in the
> previous email, where I said that multiple <script> elements for the same
> script will still still cause some "blocking" because the browser has to
> assume there may be a document.write() in each, which will affect how the
> rest of the DOM is assembled/interpreted... the timing impact of such should
> be quite obvious.
>
> But if only 1 listing in the RT array shows for the first network delay
> request, what's missing in that picture is how that resource being
> re-interpreted multiple times on the page had timing impacts on the page and
> on other resources.
>
>
> >>Yes, and furthermore, I think (mostly for data filtering purposes)
> >>having the status code actually in the data structure would be
> >>important. For instance, if a  tool analyzing this data wants to
> >>filter out all 404's, etc.
> >
> > There may be some privacy/security aspects about exposing the HTTP
> > response code, especially for cross-origin domains.  For example, the
> > presence of a 301 response from a login page on another domain could
> > indicate that the user is already logged in.
>
> Would it be possible to simply expose "success" or "failure" of a loaded
> item, as opposed to the exact HTTP Response code? In other words,
> 1xx/2xx/3xx codes are "success", and 4xx/5xx codes "failure".
>
> Also, same question as above, is it possible to devise a system by which
> untrusted JavaScript gets the more filtered/watered-down data (to mitigate
> attacks), but tooling/add-ons have access to the more full-fledged data
> stream?
>
>
> --Kyle
>
>
>
>
>
>
Received on Thursday, 26 May 2011 21:29:39 UTC