Re: [JSPreflight] - First Draft of JavaScript Preflight Injection online from Ilya Grigorik on 2013-09-06 (public-web-perf@w3.org from September 2013)

From: Ilya Grigorik <igrigorik@google.com>
Date: Fri, 6 Sep 2013 15:35:46 -0700
To: "Reitbauer, Alois" <Alois.Reitbauer@compuware.com>
Cc: Chase Douglas <chase@newrelic.com>, "public-web-perf@w3.org" <public-web-perf@w3.org>
Message-ID: <CADXXVKo1HtF=YLFMdBG0qDAe=Hshp1vhY+D2Wzki-qMeLH+=Ew@mail.gmail.com>
>  There are two separate problems here. One is with META tags the other
> one with BASE Tags.
> META Tags:
> There are certain meta tags where JavaScript execution before the tag
> breaks the page. An example is <meta http-equiv="X-UA-Compatible"
> content="IE=7" />. If you execute JavaScript before, the page most likely
> breaks. The same is true for meta tags which define character sets.
>
>  BASE Tags:
> A JavaScript block before a META tag  in many cases leads to a situation
> where the base tag gets applied incorrectly. The effect is simply that the
> wrong URLs are used to load resources.
>

So if I understand you correctly, you want to inject a script to
monkey-patch all other scripts? That would enable you to track RUM metrics?
This seems like disaster waiting to happen.


>      This approach is also not applicable to the case where the page
> cannot be loaded.
>
>>
>>  Correct. I do agree that we need an error reporting mechanism, but (in
>> my opinion), this proposal couples error reporting with a much heavier and
>> unnecessary injection mechanism - we're adding another cookie mechanism, a
>> full blocking JS resource, and a bunch of other complications. Which is why
>> I'm suggesting a simpler, CSP-like implementation: on error, beacon some
>> predefined error report to provided URL.
>>
>>  [Alois] The CSP case is very different from performance monitoring. How
>> would this be able to cover single-page web apps?
>>
>
>  Once again, I think this proposal mixes two completely distinct use
> cases:
>
>  *#1 Error Logging*
> We want the client to beacon the NavTiming data if the navigation fails
> (e.g. DNS lookup succeeded, TCP handshake timed out). In this instance,
> simply beaconing the NavTiming object to some endpoint would be enough -
> caveat: if TCP failed, how do I even know where to beacon it? This is where
> we either get into some persistent settings (e.g. first you would have to
> have a successful navigation to get this instruction, and then persist it
> (ugh)), or we're into store-and-send-later type of semantics. In any case,
> this gets complicated quickly.. and I'm skeptical that new 'cookie-like'
> mechanisms are going to get us far -- this opens privacy holes, etc. There
> is just not much appetite for that nowadays. But.. this would be great to
> solve.
>
>  [Alois]
> I do not think we need persistence in case of a TCP failure. This might
> just be your bad Starbuck WiFi. Additionally general broader connectivity
> failures for your datacenter will be cover with synthetic availability
> monitoring. RUM will never replace this completely.
>

Why not? It certainly could, perhaps even should. If some local provider
has a misconfigured or compromised DNS that's breaking my site, it'd be
nice to know about it. I can't rely on synthetic testing node in every
network out there. (Just playing devil's advocate)


> Why don't you think the cookie mechanism is not helpful in client-side
> processing of RUM data?
> Which security holes do you mean specifically?
>

It's yet another vector to fingerprint the user; it's a huge performance
liability and a blatant SPOF, etc.

If such mechanism existed today, I would immediately put up a "best
practice" to avoid it like the plague. As I said earlier, in the world
where we're trying to reduce RTT's and latency is the bottleneck, these
blocking scripts are an anti-pattern.

And I'm still of the mind that you should be able to achieve what you're
after without making them be a blocking resource. Just have the site
include your script, and do your stuff.


>  Also, if possible, I would also think about "incomplete" loads and not
> just errors - e.g. I click on a link, the new page is rendering, the user
> navigates back before onload is fired. In many cases, if you're following
> best practices and doing async load / deferring your analytics, you'll miss
> this pageview (bounce) and RUM sample. To "fight this" some large sites
> still put their analytics in the head (ugh) - this is a big problem. I
> don't have hard numbers, but my intuition tells me that this may actually
> be a larger problem than failed connections.
>
>  [Alois]
> Actually web site owners want to understand incomplete loads of page. If
> some ads for example did not load or the user cancelled after a certain
> time this is important information. If an analytics solution does not cover
> these cases it is incomplete.
>

I think we're saying the same thing here.


> Pushing the content in the head therefore is not a problem it is necessary
> to get complete analytics data.
>

It is a problem. Analytics shouldn't block rendering. The fact that we
can't achieve this today is a bug - that's what we need to solve, and
without introducing more blocking behaviors.


> *#2 Instrumenting client-code (single page apps, etc). *
> This is completely orthogonal to error logging, and I think this should be
> split into a completely different discussion.
>
>  First, there is no reason why your script needs to block all others --
> in this day and age of fighting against blocking resources, this is also a
> deal breaker. For example, consider GA, which has to deal with this very
> problem: the client code may be loaded first and it pay issue GA commands
> before GA is loaded. The solution is simple.. create a "_gaq" command queue
> object and push commands in there. Then, once GA loads, it inspects the
> object and does what it needs to do -- convention works.
>
>  [Alois]
>
>  This means that GA misses every event before the GA code is loaded. As
> you cannot guarantee when this is - because of network connectivity for
> example - you cannot make a clear statement which user interactions you can
> track and which not. We are constantly confronted with the claim of 100%
> coverage, this cannot be guaranteed by an approach as described above.
>

No, it does not. It's a simple convention:

var _gaq = _gaq || [];
_gaq.push(['_trackPageview']);

You don't need the script to be loaded to register the pageview event. The
GA script, once loaded looks for _gaq and does its work. This works today
just fine:
https://developers.google.com/analytics/devguides/collection/gajs/

The only caveat is the case where ga.js is not loaded in-time, and that's
the case we're discussing above.

 Even the command queue approach requires some JavaScript to be executed
> early on as well as page modification. The inability to modify the page is
> one of the key use cases for the spec. You are not addressing this with
> your approach.
>

No.. All you need to do is check if _gaq already exists, and if not, just
initialize it to an empty array.


>  I agree that this is indeed helpful. We had conversations with some
> framework developers and added hooks. In general they are, however,
> hesitant as this increases the complexity of their frameworks. Events are
> not necessarily enough here, you have to wrap callback functions etc. to
> recconstruct the user behaviour from your monitoring data. An example is
> click-path analysis for single page apps as well as analysing functional
> errors in browsers.
>

Ok, maybe I'm missing the point, but why is this impossible to achieve
without a blocking resource?

ig
Received on Friday, 6 September 2013 22:36:54 UTC