Re: Proposing changes to Navigation Error Logging from Arvind Jain on 2014-08-17 (public-web-perf@w3.org from August 2014)

From: Arvind Jain <arvind@google.com>
Date: Sun, 17 Aug 2014 14:13:35 -0700
To: Ilya Grigorik <igrigorik@google.com>
Cc: ttuttle <ttuttle@chromium.org>, "Aaron Heady (BING AVAILABILITY)" <aheady@microsoft.com>, "public-web-perf@w3.org" <public-web-perf@w3.org>
Message-ID: <CAOYaDdPCpXtgyRH0Ht_3bpYTU=zf1f_+U1kpstS78UChabTDSA@mail.gmail.com>

This seems reasonable (re #3). Seems like folks have not objected to
it. Let me update the draft with it. I'll reply on this thread with
the new revision.

On Mon, Jul 28, 2014 at 11:19 AM, Ilya Grigorik <igrigorik@google.com> wrote:
>
> On Fri, Jul 25, 2014 at 11:21 AM, ttuttle <ttuttle@chromium.org> wrote:
>>>
>>> 3. I'd like to allow the user-agent to retry the uploads if they fail. If
>>> the issue is a transient network issue (i.e. a route is flapping), it's a
>>> waste to throw out the error report just because the network was still
>>> glitched the first time the upload was attempted.
>>>
>>>
>>>
>>> Aaron: This reads like a denial of service attack. We did discuss it
>>> originally, but how do you control the retries when an origin has a short
>>> lived but widespread spike in errors, especially when the origin for the
>>> error is also the origin/logging endpoint for these navigation error calls.
>>> A few seconds after it recovers it gets hit with a global surge in telemetry
>>> request, knocking if offline, more errors…... Also goes back to #1, any
>>> error that is stable enough to repro is going to be reported by a large
>>> number of users. I expect this system to be lossy telemetry wise. Optimized
>>> to protect the origin, not the error telemetry. And if you wait for the next
>>> successful page load, then you can get the errors from the queue.
>>
>>
>> Hmm, I see your point. I'll see if we can do without retries, or postpone
>> them until the next time we would've made a new upload anyway.
>
>
> Mirroring the language in Beacon, can we defer this decision to the UA?
> "The User Agent SHOULD make a best effort attempt to eventually transmit the
> data."
>
> It seems reasonable to allow the UA to both delay and attempt to retransmit.
>
> ig

Received on Sunday, 17 August 2014 21:14:04 UTC