Re: add "networkDuration" to Resource Timing from Patrick Meenan on 2014-12-23 (public-web-perf@w3.org from December 2014)

From: Patrick Meenan <pmeenan@webpagetest.org>
Date: Tue, 23 Dec 2014 15:50:55 -0500
To: Ilya Grigorik <igrigorik@google.com>
Cc: Steve Souders <steve@souders.org>, Peter Lepeska <bizzbyster@gmail.com>, Nic Jansma <nic@nicj.net>, Yoav Weiss <yoav@yoav.ws>, public-web-perf <public-web-perf@w3.org>
Message-ID: <CAKHu2Gmsyi_PuoAN4HxCy65+oBWx0fPKYWo-Lgkz6Zi-WONFrg@mail.gmail.com>
Hmm, WebPageTest may report "something" in the blocked time but I think
it's always forced to -1 since I don't currently track it internally.  That
said, it's not that hard to add and I'll see if I can get it in before the
next HTTP Archive crawl.

On Tue, Dec 23, 2014 at 3:16 PM, Ilya Grigorik <igrigorik@google.com> wrote:
>
> On Mon, Dec 22, 2014 at 3:22 PM, Steve Souders <steve@souders.org> wrote:
>
>>  > Sure, but that's mostly an educational and easily fixable problem on
>> their end... Short of (2b) case.
>>
>> It's more than an educational problem. Developers typically look at code
>> and object properties before documentation and tutorials. "Duration" is
>> short and encompassing. It'll be the first choice. The people I've seen who
>> have already made this mistake come from smart, webperf cutting edge
>> organizations, as evidenced by the fact that they're using Resource Timing
>> in production systems. If the cutting edge gurus make the mistake it's
>> likely that we need more than education.
>>
>
> I disagree with this. I think the current definition of "duration" is
> correct and, in fact, is exactly what applications should be measuring:
> time from the moment you requested the resource to when it is available.
> This includes time to check the appropriate caches, which is non-zero and
> can be in tens and hundreds of milliseconds, connection setup time,
> blocking time due to head-of-line blocking (http/1 artifact), and the
> actual transfer times.
>
> If you want to exclusively measure the "network transfer time" and exclude
> cache and blocking overhead, then you should do that as a separate
> metric... I think what you're pointing out here is that most people assume
> that cache lookups are effectively free, and http/1 HoL is not a problem...
> and that, to me, is an education problem, not a metric problem.
>
> > Could we instrument HTTP Archive to log blocking time for each resource?
>> I accept pull requests. ;-) But given that the average website has 50+
>> resources on a single hostname
>> <http://httparchive.org/trends.php#numDomains&maxDomainReqs> that's 44
>> requests that have blocking time.
>>
>
> But not all of them are dispatched simultaneously either: some are delayed
> because they're declared later in the document, some have to wait for
> layout (e.g. CSS spec'ed resources), and others may be scheduled via JS,
> etc. It'd be good to understand how this looks in the real world.
>
> Good news is, looks like HAR already captures this in "timings: { blocked:
> ...}": http://www.softwareishard.com/blog/har-12-spec/#timings. I
> verified that both WPT and Chrome HAR export report the metric. So, we
> already have the data in the raw WPT results... "just" need to pull it out
> ;)
>
>
>> > But isn't this the same problem in a different disguise?
>> Yes, but not as significant.
>>
>
> We have research showing that even flash I/O can be very expensive [1],
> and it mirrors some of the metrics we've gathered in the past in Chrome..
> Plus, in addition to slow I/O we also have thread hopping, etc, all of
> which adds non-trivial overhead. I'm not convinced we can just sweep this
> under the rug.
>
> ig
>
> [1] http://dl.acm.org/citation.cfm?id=2385607
>
>
>> On 12/22/14 9:25 AM, Ilya Grigorik wrote:
>>
>>  On Wed, Dec 17, 2014 at 9:53 PM, Steve Souders <steve@souders.org>
>> wrote:
>>
>>> The use cases were CDNs, RUM providers, and website owners using
>>> Resource Timing's duration to measure (what they thought was) download time
>>> of resources. In fact, one of the RUM providers (Buddy from SOASTA) did a
>>> preso at WebPerfdays showing code to track "duration" and captured it in a
>>> property called "downloadtime" - so everyone in that audience now things
>>> "duration" means "download time". Bummer!
>>>
>>
>>  Sure, but that's mostly an educational and easily fixable problem on
>> their end... Short of (2b) case.
>>
>>
>>> For the (2b) case (different origin & you don't control it so can't add
>>> TAO header), you're right that sometimes there's no action the website
>>> owner can take. For example, if the Twitter widget loads other scripts &
>>> images dynamically, there's not much the website owner can do. But there
>>> are *numerous* situations where the timing of (2b) content IS actionable.
>>> If the website owner was able to distinguish blocking time from download
>>> time they'd be able to make the right decision and take action. For example:
>>>     - fonts - These are blocking the page from rendering. If it's
>>> because the fonts are slow to download, then I might want to switch font
>>> providers. If it's because of blocking, then I might want to preload or
>>> prefetch the fonts.
>>>     - ads - I moved the ad in my page and clickthroughs dropped off
>>> significantly. Is that because the ad content is blocked or slow? Or
>>> something else?
>>>     - JS libs - I might want to find out if
>>> https://code.jquery.com/jquery-2.1.2.min.js is loading slow on my site
>>> because it's blocked or just slow to download. Again, there are many
>>> actions the website owner can take - load it async, prefetch it, host it
>>> locally, get it from Google CDN.
>>>
>>
>>  As an aside... I'm wondering if we can gather some data on how often
>> this is actually a problem? Could we instrument HTTP Archive to log
>> blocking time for each resource?
>>
>>
>>> Choosing a name is hard because I assume we do NOT want to reveal
>>> whether the object was read from cache for cross-origin resources. Thus,
>>> "networkDuration" could actually not involve any network requests at all. I
>>> thought about calling it "loadtime" since that covers loading it over the
>>> network or from cache. Again, I'm not insistent on "networkDuration" and
>>> would love better name brainstorming.
>>>
>>
>>  But isn't this the same problem in a different disguise? I thought I
>> was measuring the latency of my CDN, but I'm actually measuring latency of
>> my cache lookup plus the CDN fetch, where the former can easily take tens
>> if not hundreds of milliseconds.. and crazily enough, be higher than the
>> actual network fetch.
>>
>>  ig
>>
>>
>>>   On 12/4/14 9:13 AM, Ilya Grigorik wrote:
>>>
>>>  On Mon, Nov 24, 2014 at 4:34 PM, Steve Souders <steve@souders.org>
>>>  wrote:
>>>
>>>> LONG: A few weeks ago I discovered that "duration" includes blocking
>>>> time, so "duration" is greater than the actual network time needed to
>>>> download the resource. Since then I've been at Velocity and WebPerfDays
>>>> where many people have shown their Resource Timing code. Everyone I spoke
>>>> to (~5 different teams) assumed that "duration" was just the network time.
>>>> When I explain that it also includes blocking they were surprised, admitted
>>>> they hadn't known that, and agreed it is NOT the metric they were trying to
>>>> capture.
>>>
>>>
>>>  Steve, can you elaborate on the use case a bit more? Who's measuring
>>> what here, and for what purpose? Are we benchmarking CDN performance?
>>>
>>>  In terms of getting access to the data, we have the following cases:
>>> 1) same origin resources: full access to timing data.
>>> 2) different origin:
>>>   a) if you control it, add TAO header for full access to timing data.
>>>   b) if you don't control it, you only have "duration"
>>>
>>>  For (1) and (2a), I can see why you may want or need to get low-level
>>> "network duration" data: you want to track your provider's DNS performance,
>>> latency to your CDN, TTFB, total response time, and so on. You care about
>>> this because this is something *you can affect*. However, for (2b)... this
>>> same data falls into interesting but not actionable bucket? Further, it
>>> seems like if you are actually interested in benchmarking your CDN, then
>>> you really should be looking deeper than just total time: you want to
>>> decompose DNS, TCP, TLS, HTTP req>resp cycles. At which point.. you need
>>> the full timing object anyway.
>>>
>>>   I propose we add a new property to Resource Timing that reflects the
>>>> time to actually load the resource excluding blocking time. I'm flexible
>>>> about the name but for purposes of this discussion let's call it
>>>> "networkDuration". The important piece of this proposal is that
>>>> "networkDuration" should be available for all resources, similar to
>>>> "duration". In other words, it should be available for same origin as well
>>>> as cross origin resources as part of the PerformanceEntry
>>>> <http://www.w3.org/TR/performance-timeline/#performanceentry>
>>>>  interface.
>>>>
>>>
>>>  Note that "blocking time" is a thing of the past for SPDY and HTTP/2,
>>> as this demo demonstrates really well: http://www.httpvshttps.com/
>>>
>>>  I'm skeptical of above definition: if you want "network duration", you
>>> should also exclude cache time; it's a computed metric that you can access
>>> today with TAO and a redundant one with http/2; if you really care about
>>> "network duration" you should probably decompose it further, but at that
>>> point it becomes a conversation about removing the TAO restriction.
>>>
>>>  ig
>>>
>>>  P.S. "networkDuration = dns + tcp + waiting + content" ... don't
>>> forget the https handshake!
>>>
>>> On Wed, Nov 26, 2014 at 9:01 AM, Patrick Meenan <pmeenan@webpagetest.org
>>> > wrote:
>>>
>>>> Would be great to see it either as a high-level duration or as an
>>>> unblocking of the redirectStart time for cross-origin (though it may still
>>>> not be clear to people that that is the time they really care about).
>>>>
>>>>  I expect the current logic was the easiest and didn't require any
>>>> privacy reviews because it's quite literally the exact same detail that you
>>>> get if you do it manually in javascript by creating an element and
>>>> listening to the onload.  Even if the more-granular detail doesn't really
>>>> expose anything you couldn't figure out before it does provide additional
>>>> detail that wouldn't otherwise be measurable and is probably going to
>>>> require reviews by privacy and security teams.
>>>>
>>>> On Wed, Nov 26, 2014 at 9:36 AM, Peter Lepeska <bizzbyster@gmail.com>
>>>> wrote:
>>>>
>>>>> +1
>>>>>
>>>>> On Tue, Nov 25, 2014 at 12:31 PM, Nic Jansma <nic@nicj.net> wrote:
>>>>>
>>>>>>  Good point!  Hadn't considered that, so yes I would agree it's a
>>>>>> very valuable addition to consider.
>>>>>>
>>>>>> As far as what interface to put it on, I'm not sure networkDuration
>>>>>> would make sense for UserTiming, for example.  While it could sit on
>>>>>> PerformanceEntry and just be "0" for interfaces that aren't applicable, we
>>>>>> could also create a PerformanceNetworkEntry interface (with
>>>>>> networkDuration) that PerformanceResourceTiming inherits from, while
>>>>>> PerformanceUserTiming only inherits from PerformanceEntry.
>>>>>>
>>>>>> That's all minor details though.  Really depends on the browser
>>>>>> privacy teams OK'ing the addition.
>>>>>>
>>>>>> - Nichttp://nicj.net/
>>>>>> @NicJ
>>>>>>
>>>>>>   On 11/25/2014 12:16 PM, Steve Souders wrote:
>>>>>>
>>>>>> Nic -
>>>>>>
>>>>>> You can *not* calculate networkDuration from other attributes for
>>>>>> *cross-origin* resources. That's why I'm suggesting adding this to
>>>>>> PerformanceEntry (rather than PerformanceResourceTiming).
>>>>>>
>>>>>> And as mentioned, about 50% of resources are cross-origin so it's
>>>>>> important to provide a means for *accurate* download time measurements.
>>>>>>
>>>>>> -Steve
>>>>>>
>>>>>>
>>>>>> On 11/25/14, 8:02 AM, Nic Jansma wrote:
>>>>>>
>>>>>> Steve,
>>>>>>
>>>>>> The only downside I see is that we're adding a new attribute that can
>>>>>> be entirely calculated via other attributes.
>>>>>>
>>>>>> One alternate (or additional thing) would be to highlight this point
>>>>>> in the description for "duration" in the spec.
>>>>>>
>>>>>> - Nichttp://nicj.net/
>>>>>> @NicJ
>>>>>>
>>>>>> On 11/25/2014 3:04 AM, Yoav Weiss wrote:
>>>>>>
>>>>>>
>>>>>> On Tue, Nov 25, 2014 at 1:34 AM, Steve Souders <steve@souders.org>
>>>>>> wrote:
>>>>>>
>>>>>>>  SHORT: I propose we add the "networkDuration" property to
>>>>>>> PerformanceEntry
>>>>>>> <http://www.w3.org/TR/performance-timeline/#performanceentry>
>>>>>>> objects.
>>>>>>>
>>>>>>> LONG: A few weeks ago I discovered that "duration" includes blocking
>>>>>>> time, so "duration" is greater than the actual network time needed to
>>>>>>> download the resource. Since then I've been at Velocity and WebPerfDays
>>>>>>> where many people have shown their Resource Timing code. Everyone I spoke
>>>>>>> to (~5 different teams) assumed that "duration" was just the network time.
>>>>>>> When I explain that it also includes blocking they were surprised, admitted
>>>>>>> they hadn't known that, and agreed it is NOT the metric they were trying to
>>>>>>> capture.
>>>>>>>
>>>>>>> I propose we add a new property to Resource Timing that reflects the
>>>>>>> time to actually load the resource excluding blocking time. I'm flexible
>>>>>>> about the name but for purposes of this discussion let's call it
>>>>>>> "networkDuration". The important piece of this proposal is that
>>>>>>> "networkDuration" should be available for all resources, similar to
>>>>>>> "duration". In other words, it should be available for same origin as well
>>>>>>> as cross origin resources as part of the PerformanceEntry
>>>>>>> <http://www.w3.org/TR/performance-timeline/#performanceentry>
>>>>>>> interface.
>>>>>>>
>>>>>>> Same origin resources can calculate "networkDuration" as follows
>>>>>>> (assume "r" is a PerformanceResourceTiming
>>>>>>> <http://?ui=2&ik=b493d86064&view=att&th=149e4608a5dad0d6&attid=0.1.1&disp=emb&zw&atsh=0>
>>>>>>> object):
>>>>>>>
>>>>>>>     dns = r.domainLookupEnd - r.domainLookupStart;
>>>>>>>     tcp = r.connectEnd - r.connectStart;        // includes ssl
>>>>>>> negotiation
>>>>>>>     waiting = r.responseStart - r.requestStart; // aka "TTFB"
>>>>>>>     content = r.responseEnd - r.responseStart;
>>>>>>>     networkDuration = dns + tcp + waiting + content;
>>>>>>>
>>>>>>> I've discussed this with a few people and the only concern I've
>>>>>>> heard is with regard to privacy along the lines of "if we exclude blocking
>>>>>>> we've added the ability to distinguish cache reads from network fetches".
>>>>>>> This isn't an issue for two reasons:
>>>>>>>
>>>>>>>    1. Even with the exclusion of blocking time, it's still possible
>>>>>>>    for "networkDuration" to have a non-zero value for resources read from
>>>>>>>    cache due to disk access time, etc. Therefore, excluding blocking time does
>>>>>>>    not necessarily provide a clear means of determining resources read from
>>>>>>>    cache.
>>>>>>>    2. This concern assumes that adding "networkDuration" lessens
>>>>>>>    privacy because removing blocking time provides additional information that
>>>>>>>    is not available today. However, it's possible to exclude blocking time
>>>>>>>    today by loading a cross-origin resource after window.onload, when there is
>>>>>>>    no blocking contention.
>>>>>>>
>>>>>>> Therefore, individuals who have JavaScript access to a page and can
>>>>>>> measure duration also have enough access to load resources after
>>>>>>> window.onload and can thus determine the duration excluding blocking time.
>>>>>>> Adding "networkDuration" does not give these individuals additional
>>>>>>> information beyond what is measurable today.
>>>>>>>
>>>>>>> What "networkDuration" provides is additional information for the
>>>>>>> normal case of resources that are loaded as part of the main page when
>>>>>>> blocking contention may occur. This will give current web developers the
>>>>>>> metric they want for cross-origin resources, and will provide it more
>>>>>>> simply for same origin resources.
>>>>>>>
>>>>>>
>>>>>>  Assuming that the privacy concerns are in fact non-existent, a big
>>>>>> +1.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>>
>>
>>
>
Received on Tuesday, 23 December 2014 20:51:25 UTC