Re: add "networkDuration" to Resource Timing from Steve Souders on 2014-12-23 (public-web-perf@w3.org from December 2014)

From: Steve Souders <steve@souders.org>
Date: Tue, 23 Dec 2014 12:25:14 -0800
To: Ilya Grigorik <igrigorik@google.com>
CC: Patrick Meenan <pmeenan@webpagetest.org>, Peter Lepeska <bizzbyster@gmail.com>, Nic Jansma <nic@nicj.net>, Yoav Weiss <yoav@yoav.ws>, public-web-perf <public-web-perf@w3.org>
Message-ID: <5499CFAA.1070004@souders.org>
 > I think the current definition of "duration" is correct

I've never questioned that the definition of "duration" is incorrect. 
Instead, I'm suggesting that we add a new metric called something like 
"networkDuration".

 > If you want to exclusively measure the "network transfer time" and 
exclude cache and blocking overhead, then you should do that as a 
separate metric

Yes, that's it exactly.

-Steve

On 12/23/14 12:16 PM, Ilya Grigorik wrote:
> On Mon, Dec 22, 2014 at 3:22 PM, Steve Souders <steve@souders.org 
> <mailto:steve@souders.org>> wrote:
>
>     > Sure, but that's mostly an educational and easily fixable
>     problem on their end... Short of (2b) case.
>
>     It's more than an educational problem. Developers typically look
>     at code and object properties before documentation and tutorials.
>     "Duration" is short and encompassing. It'll be the first choice.
>     The people I've seen who have already made this mistake come from
>     smart, webperf cutting edge organizations, as evidenced by the
>     fact that they're using Resource Timing in production systems. If
>     the cutting edge gurus make the mistake it's likely that we need
>     more than education.
>
>
> I disagree with this. I think the current definition of "duration" is 
> correct and, in fact, is exactly what applications should be 
> measuring: time from the moment you requested the resource to when it 
> is available. This includes time to check the appropriate caches, 
> which is non-zero and can be in tens and hundreds of milliseconds, 
> connection setup time, blocking time due to head-of-line blocking 
> (http/1 artifact), and the actual transfer times.
>
> If you want to exclusively measure the "network transfer time" and 
> exclude cache and blocking overhead, then you should do that as a 
> separate metric... I think what you're pointing out here is that most 
> people assume that cache lookups are effectively free, and http/1 HoL 
> is not a problem... and that, to me, is an education problem, not a 
> metric problem.
>
>     > Could we instrument HTTP Archive to log blocking time for each
>     resource?
>     I accept pull requests. ;-) But given that the average website has
>     50+ resources on a single hostname
>     <http://httparchive.org/trends.php#numDomains&maxDomainReqs>
>     that's 44 requests that have blocking time.
>
>
> But not all of them are dispatched simultaneously either: some are 
> delayed because they're declared later in the document, some have to 
> wait for layout (e.g. CSS spec'ed resources), and others may be 
> scheduled via JS, etc. It'd be good to understand how this looks in 
> the real world.
>
> Good news is, looks like HAR already captures this in "timings: { 
> blocked: ...}": 
> http://www.softwareishard.com/blog/har-12-spec/#timings. I verified 
> that both WPT and Chrome HAR export report the metric. So, we already 
> have the data in the raw WPT results... "just" need to pull it out ;)
>
>     > But isn't this the same problem in a different disguise?
>     Yes, but not as significant.
>
>
> We have research showing that even flash I/O can be very expensive 
> [1], and it mirrors some of the metrics we've gathered in the past in 
> Chrome.. Plus, in addition to slow I/O we also have thread hopping, 
> etc, all of which adds non-trivial overhead. I'm not convinced we can 
> just sweep this under the rug.
>
> ig
>
> [1] http://dl.acm.org/citation.cfm?id=2385607
>
>     On 12/22/14 9:25 AM, Ilya Grigorik wrote:
>>     On Wed, Dec 17, 2014 at 9:53 PM, Steve Souders <steve@souders.org
>>     <mailto:steve@souders.org>> wrote:
>>
>>         The use cases were CDNs, RUM providers, and website owners
>>         using Resource Timing's duration to measure (what they
>>         thought was) download time of resources. In fact, one of the
>>         RUM providers (Buddy from SOASTA) did a preso at WebPerfdays
>>         showing code to track "duration" and captured it in a
>>         property called "downloadtime" - so everyone in that audience
>>         now things "duration" means "download time". Bummer!
>>
>>
>>     Sure, but that's mostly an educational and easily fixable problem
>>     on their end... Short of (2b) case.
>>
>>         For the (2b) case (different origin & you don't control it so
>>         can't add TAO header), you're right that sometimes there's no
>>         action the website owner can take. For example, if the
>>         Twitter widget loads other scripts & images dynamically,
>>         there's not much the website owner can do. But there are
>>         *numerous* situations where the timing of (2b) content IS
>>         actionable. If the website owner was able to distinguish
>>         blocking time from download time they'd be able to make the
>>         right decision and take action. For example:
>>             - fonts - These are blocking the page from rendering. If
>>         it's because the fonts are slow to download, then I might
>>         want to switch font providers. If it's because of blocking,
>>         then I might want to preload or prefetch the fonts.
>>             - ads - I moved the ad in my page and clickthroughs
>>         dropped off significantly. Is that because the ad content is
>>         blocked or slow? Or something else?
>>             - JS libs - I might want to find out if
>>         https://code.jquery.com/jquery-2.1.2.min.js is loading slow
>>         on my site because it's blocked or just slow to download.
>>         Again, there are many actions the website owner can take -
>>         load it async, prefetch it, host it locally, get it from
>>         Google CDN.
>>
>>
>>     As an aside... I'm wondering if we can gather some data on how
>>     often this is actually a problem? Could we instrument HTTP
>>     Archive to log blocking time for each resource?
>>
>>         Choosing a name is hard because I assume we do NOT want to
>>         reveal whether the object was read from cache for
>>         cross-origin resources. Thus, "networkDuration" could
>>         actually not involve any network requests at all. I thought
>>         about calling it "loadtime" since that covers loading it over
>>         the network or from cache. Again, I'm not insistent on
>>         "networkDuration" and would love better name brainstorming.
>>
>>
>>     But isn't this the same problem in a different disguise? I
>>     thought I was measuring the latency of my CDN, but I'm actually
>>     measuring latency of my cache lookup plus the CDN fetch, where
>>     the former can easily take tens if not hundreds of milliseconds..
>>     and crazily enough, be higher than the actual network fetch.
>>
>>     ig
>>
>>         On 12/4/14 9:13 AM, Ilya Grigorik wrote:
>>>         On Mon, Nov 24, 2014 at 4:34 PM, Steve Souders
>>>         <steve@souders.org <mailto:steve@souders.org>> wrote:
>>>
>>>             LONG: A few weeks ago I discovered that "duration"
>>>             includes blocking time, so "duration" is greater than
>>>             the actual network time needed to download the resource.
>>>             Since then I've been at Velocity and WebPerfDays where
>>>             many people have shown their Resource Timing code.
>>>             Everyone I spoke to (~5 different teams) assumed that
>>>             "duration" was just the network time. When I explain
>>>             that it also includes blocking they were surprised,
>>>             admitted they hadn't known that, and agreed it is NOT
>>>             the metric they were trying to capture.
>>>
>>>
>>>         Steve, can you elaborate on the use case a bit more? Who's
>>>         measuring what here, and for what purpose? Are we
>>>         benchmarking CDN performance?
>>>
>>>         In terms of getting access to the data, we have the
>>>         following cases:
>>>         1) same origin resources: full access to timing data.
>>>         2) different origin:
>>>           a) if you control it, add TAO header for full access to
>>>         timing data.
>>>           b) if you don't control it, you only have "duration"
>>>
>>>         For (1) and (2a), I can see why you may want or need to get
>>>         low-level "network duration" data: you want to track your
>>>         provider's DNS performance, latency to your CDN, TTFB, total
>>>         response time, and so on. You care about this because this
>>>         is something *you can affect*. However, for (2b)... this
>>>         same data falls into interesting but not actionable bucket?
>>>         Further, it seems like if you are actually interested in
>>>         benchmarking your CDN, then you really should be looking
>>>         deeper than just total time: you want to decompose DNS, TCP,
>>>         TLS, HTTP req>resp cycles. At which point.. you need the
>>>         full timing object anyway.
>>>
>>>             I propose we add a new property to Resource Timing that
>>>             reflects the time to actually load the resource
>>>             excluding blocking time. I'm flexible about the name but
>>>             for purposes of this discussion let's call it
>>>             "networkDuration". The important piece of this proposal
>>>             is that "networkDuration" should be available for all
>>>             resources, similar to "duration". In other words, it
>>>             should be available for same origin as well as cross
>>>             origin resources as part of the PerformanceEntry
>>>             <http://www.w3.org/TR/performance-timeline/#performanceentry> interface.
>>>
>>>
>>>         Note that "blocking time" is a thing of the past for SPDY
>>>         and HTTP/2, as this demo demonstrates really well:
>>>         http://www.httpvshttps.com/
>>>
>>>         I'm skeptical of above definition: if you want "network
>>>         duration", you should also exclude cache time; it's a
>>>         computed metric that you can access today with TAO and a
>>>         redundant one with http/2; if you really care about "network
>>>         duration" you should probably decompose it further, but at
>>>         that point it becomes a conversation about removing the TAO
>>>         restriction.
>>>
>>>         ig
>>>
>>>         P.S. "networkDuration = dns + tcp + waiting + content" ...
>>>         don't forget the https handshake!
>>>
>>>         On Wed, Nov 26, 2014 at 9:01 AM, Patrick Meenan
>>>         <pmeenan@webpagetest.org <mailto:pmeenan@webpagetest.org>>
>>>         wrote:
>>>
>>>             Would be great to see it either as a high-level duration
>>>             or as an unblocking of the redirectStart time for
>>>             cross-origin (though it may still not be clear to people
>>>             that that is the time they really care about).
>>>
>>>             I expect the current logic was the easiest and didn't
>>>             require any privacy reviews because it's quite literally
>>>             the exact same detail that you get if you do it manually
>>>             in javascript by creating an element and listening to
>>>             the onload.  Even if the more-granular detail doesn't
>>>             really expose anything you couldn't figure out before it
>>>             does provide additional detail that wouldn't otherwise
>>>             be measurable and is probably going to require reviews
>>>             by privacy and security teams.
>>>
>>>             On Wed, Nov 26, 2014 at 9:36 AM, Peter Lepeska
>>>             <bizzbyster@gmail.com <mailto:bizzbyster@gmail.com>> wrote:
>>>
>>>                 +1
>>>
>>>                 On Tue, Nov 25, 2014 at 12:31 PM, Nic Jansma
>>>                 <nic@nicj.net <mailto:nic@nicj.net>> wrote:
>>>
>>>                     Good point!  Hadn't considered that, so yes I
>>>                     would agree it's a very valuable addition to
>>>                     consider.
>>>
>>>                     As far as what interface to put it on, I'm not
>>>                     sure networkDuration would make sense for
>>>                     UserTiming, for example. While it could sit on
>>>                     PerformanceEntry and just be "0" for interfaces
>>>                     that aren't applicable, we could also create a
>>>                     PerformanceNetworkEntry interface (with
>>>                     networkDuration) that PerformanceResourceTiming
>>>                     inherits from, while PerformanceUserTiming only
>>>                     inherits from PerformanceEntry.
>>>
>>>                     That's all minor details though. Really depends
>>>                     on the browser privacy teams OK'ing the addition.
>>>
>>>                     - Nic
>>>                     http://nicj.net/
>>>                     @NicJ
>>>
>>>                     On 11/25/2014 12:16 PM, Steve Souders wrote:
>>>>                     Nic -
>>>>
>>>>                     You can *not* calculate networkDuration from
>>>>                     other attributes for *cross-origin* resources.
>>>>                     That's why I'm suggesting adding this to
>>>>                     PerformanceEntry (rather than
>>>>                     PerformanceResourceTiming).
>>>>
>>>>                     And as mentioned, about 50% of resources are
>>>>                     cross-origin so it's important to provide a
>>>>                     means for *accurate* download time measurements.
>>>>
>>>>                     -Steve
>>>>
>>>>
>>>>                     On 11/25/14, 8:02 AM, Nic Jansma wrote:
>>>>>                     Steve,
>>>>>
>>>>>                     The only downside I see is that we're adding a
>>>>>                     new attribute that can be entirely calculated
>>>>>                     via other attributes.
>>>>>
>>>>>                     One alternate (or additional thing) would be
>>>>>                     to highlight this point in the description for
>>>>>                     "duration" in the spec.
>>>>>                     - Nic
>>>>>                     http://nicj.net/
>>>>>                     @NicJ
>>>>>                     On 11/25/2014 3:04 AM, Yoav Weiss wrote:
>>>>>>
>>>>>>                     On Tue, Nov 25, 2014 at 1:34 AM, Steve
>>>>>>                     Souders <steve@souders.org
>>>>>>                     <mailto:steve@souders.org>> wrote:
>>>>>>
>>>>>>                         SHORT: I propose we add the
>>>>>>                         "networkDuration" property to
>>>>>>                         PerformanceEntry
>>>>>>                         <http://www.w3.org/TR/performance-timeline/#performanceentry>
>>>>>>                         objects.
>>>>>>
>>>>>>                         LONG: A few weeks ago I discovered that
>>>>>>                         "duration" includes blocking time, so
>>>>>>                         "duration" is greater than the actual
>>>>>>                         network time needed to download the
>>>>>>                         resource. Since then I've been at
>>>>>>                         Velocity and WebPerfDays where many
>>>>>>                         people have shown their Resource Timing
>>>>>>                         code. Everyone I spoke to (~5 different
>>>>>>                         teams) assumed that "duration" was just
>>>>>>                         the network time. When I explain that it
>>>>>>                         also includes blocking they were
>>>>>>                         surprised, admitted they hadn't known
>>>>>>                         that, and agreed it is NOT the metric
>>>>>>                         they were trying to capture.
>>>>>>
>>>>>>                         I propose we add a new property to
>>>>>>                         Resource Timing that reflects the time to
>>>>>>                         actually load the resource excluding
>>>>>>                         blocking time. I'm flexible about the
>>>>>>                         name but for purposes of this discussion
>>>>>>                         let's call it "networkDuration". The
>>>>>>                         important piece of this proposal is that
>>>>>>                         "networkDuration" should be available for
>>>>>>                         all resources, similar to "duration". In
>>>>>>                         other words, it should be available for
>>>>>>                         same origin as well as cross origin
>>>>>>                         resources as part of the PerformanceEntry
>>>>>>                         <http://www.w3.org/TR/performance-timeline/#performanceentry>
>>>>>>                         interface.
>>>>>>
>>>>>>                         Same origin resources can calculate
>>>>>>                         "networkDuration" as follows (assume "r"
>>>>>>                         is a PerformanceResourceTiming
>>>>>>                         <http://?ui=2&ik=b493d86064&view=att&th=149e4608a5dad0d6&attid=0.1.1&disp=emb&zw&atsh=0>
>>>>>>                         object):
>>>>>>
>>>>>>                             dns = r.domainLookupEnd -
>>>>>>                         r.domainLookupStart;
>>>>>>                         tcp = r.connectEnd - r.connectStart; //
>>>>>>                         includes ssl negotiation
>>>>>>                         waiting = r.responseStart -
>>>>>>                         r.requestStart; // aka "TTFB"
>>>>>>                         content = r.responseEnd - r.responseStart;
>>>>>>                         networkDuration = dns + tcp + waiting +
>>>>>>                         content;
>>>>>>
>>>>>>                         I've discussed this with a few people and
>>>>>>                         the only concern I've heard is with
>>>>>>                         regard to privacy along the lines of "if
>>>>>>                         we exclude blocking we've added the
>>>>>>                         ability to distinguish cache reads from
>>>>>>                         network fetches". This isn't an issue for
>>>>>>                         two reasons:
>>>>>>
>>>>>>                          1. Even with the exclusion of blocking
>>>>>>                             time, it's still possible for
>>>>>>                             "networkDuration" to have a non-zero
>>>>>>                             value for resources read from cache
>>>>>>                             due to disk access time, etc.
>>>>>>                             Therefore, excluding blocking time
>>>>>>                             does not necessarily provide a clear
>>>>>>                             means of determining resources read
>>>>>>                             from cache.
>>>>>>                          2. This concern assumes that adding
>>>>>>                             "networkDuration" lessens privacy
>>>>>>                             because removing blocking time
>>>>>>                             provides additional information that
>>>>>>                             is not available today. However, it's
>>>>>>                             possible to exclude blocking time
>>>>>>                             today by loading a cross-origin
>>>>>>                             resource after window.onload, when
>>>>>>                             there is no blocking contention.
>>>>>>
>>>>>>                         Therefore, individuals who have
>>>>>>                         JavaScript access to a page and can
>>>>>>                         measure duration also have enough access
>>>>>>                         to load resources after window.onload and
>>>>>>                         can thus determine the duration excluding
>>>>>>                         blocking time. Adding "networkDuration"
>>>>>>                         does not give these individuals
>>>>>>                         additional information beyond what is
>>>>>>                         measurable today.
>>>>>>
>>>>>>                         What "networkDuration" provides is
>>>>>>                         additional information for the normal
>>>>>>                         case of resources that are loaded as part
>>>>>>                         of the main page when blocking contention
>>>>>>                         may occur. This will give current web
>>>>>>                         developers the metric they want for
>>>>>>                         cross-origin resources, and will provide
>>>>>>                         it more simply for same origin resources.
>>>>>>
>>>>>>
>>>>>>                     Assuming that the privacy concerns are in
>>>>>>                     fact non-existent, a big +1.
>>>>>>
>>>>>
>>>>
>>>
>>>
>>>
>>>
>>
>>
>
>
Received on Tuesday, 23 December 2014 20:25:51 UTC