handling multi-request fetches in Nav & Resource Timing from Ilya Grigorik on 2015-02-18 (public-web-perf@w3.org from February 2015)

From: Ilya Grigorik <igrigorik@google.com>
Date: Wed, 18 Feb 2015 11:45:20 -0800
To: public-web-perf <public-web-perf@w3.org>
Message-ID: <CADXXVKo-MhDOx3Fn-X0hOOiDCqxv8Z7FAZ=0WgZaBXpqYc9Z5g@mail.gmail.com>
Problem: some fetches incur multiple requests before they are fulfilled -
e.g. HTTP redirects, permission preflights, resumed upload/downloads, etc.
Today, both NT and RT only show the last request in the sequence and often
hide a lot of critical timing data even for that last request - see [1],
[2]. This is a big spec gap and one that we need to address: having access
to accurate request timing data for *all requests in the sequence* is
critical to optimizing performance.

There are many different ways we can go about addressing this, and I'd like
to propose some overarching principles to help narrow down the space:

1) Performance Timeline should record *every* request as a separate entity,
regardless of whether it was initiated by the application, or done
automatically by the browser (i.e. redirects, preflights, etc).
2) Timing data for each request entity should be subject to regular TAO
restrictions - i.e. you can always observe "duration", and access to more
detailed timing data is subject to that origin emitting the appropriate TAO
opt-in policy.
3) In the case where multiple requests were incurred to satisfy the fetch,
the application should have a way to "detect and follow" the chain of
requests.

--- Examples ---

a) A navigation to example.com triggers a redirect chain:

performance.getEntriesByType("navigation") -> [
   PerformanceTiming{name: "http://example.com", <timing data>},
   PerformanceTiming{name: "https://example.com", <timing data>},
   PerformanceTiming{name: "https://example.com/en-us", <timing data>}
]

Each PerformanceEntry object contains full timing data, but access to
detailed timing data is subject to TAO. For example, in above chain the
final destination (https://example.com/en-us) can observe that there have
been three redirects (which are recorded in the sequence they occured in)
and it can access full timing data for the last two because they are on
same origin. To get access to the detailed timing data (aka, beyond
duration) for the first one, the {http://example.com} origin would need to
emit a TAO opt-in header allowing {https://example.com} to gather it.

b) A subresource fetch triggers a redirect:

performance.getEntriesByType("resource") -> [
   ... <other timing entries > ...,
   PerformanceResourceTiming{name: "http://example.com/thing.js", <timing
data>},
   PerformanceResourceTiming{name: "https://example.com/thing.js", <timing
data>},
   ... <other timing entries > ...
]

Similar to above, each request is recorded separately and is subject to the
same TAO restrictions. However, unlike a navigation request, the
getEntriesByType("resource") returns an array of *all* subresource
requests... Hence, we need a way to tell the application that a request for
"http://example.com/thing.js" triggered another request due to a redirect.
A simple way to do this would be to add a pointer to a subsequent request -
e.g. "nextName" attribute, or some such. To follow the chain after making
the request the application would:

reqs = []
reqs.push(performance.getEntriesByName("http://example.com/thing.js"))
while(r.nextName != '') {
reqs.push(performance.getEntriesByName(r.nextName))
}
processTiming(reqs)

c) A preflight request is triggered for a cross-origin request:

performance.getEntriesByType("resource") -> [
   PerformanceResourceTiming{name: "https://other.com/thing.js", <timing
attrs>},
   PerformanceResourceTiming{name: "https://other.com/thing.js", <timing
attrs>},
   ... <other entries > ...
]

Similar to above subrequest case, but the request URL remains the same
across multiple requests required to fulfill this fetch. The application
would:

reqs = performance.getEntriesByName("https://other.com/thing.js") //
returns an array with two PerformanceResourceTiming entries
processTiming(reqs)

(note: there is an implicit assumption here that requests are recorded in
the timeline in the same order as they occur)

d) Auto-resumed download/upload: effectively the same as cross-origin
case... If the browser triggers multiple requests to transfer the data, it
would simply record the timing data for each request as a standalone entry.
This allows the application to identify how many requests were made and how
long each took.

--- ~Delta from current specs ---

* Record every request into timeline
* Add "nextName" attribute (or some such) to follow cases where URL changes
between requests
* Drop redirect{Start,End} from NavTiming and ResourceTiming?
* Drop current cross-origin "hide data" if redirect logic in NT/RT

There are other nitpicks we'd have to address, but I think the above covers
the main changes we'd need to make. Removing redirect{Start,End} is not
ideal from compatibility perspective, but they would be unnecessary under
this new model. Alternatively, we can keep them around and make then return
0's.

Thoughts, other ideas? Anything I'm not accounting for in terms of use
cases or API surface?

[1] https://lists.w3.org/Archives/Public/public-web-perf/2015Jan/0006.html
[2] https://lists.w3.org/Archives/Public/public-web-perf/2015Feb/0054.html
Received on Wednesday, 18 February 2015 19:46:29 UTC