RE: Web Request Status Codes from Aaron Heady (BING) on 2013-06-05 (public-web-perf@w3.org from June 2013)

From: Aaron Heady (BING) <aheady@microsoft.com>
Date: Wed, 5 Jun 2013 14:49:55 +0000
To: James Simonsen <simonjam@google.com>, "public-web-perf@w3.org" <public-web-perf@w3.org>
Message-ID: <9f9696ddae5e4b9faeb82ab124ec50cf@BLUPR03MB278.namprd03.prod.outlook.com>
Some thoughts about the cross origin resource error logging and the privacy issues that we've been discussing. The general argument has been that we can't use all of the existing cross origin authorization processes because in the case of something like a DNS failure you wouldn't be able to get the cross origin authorization. But in the hacker/privacy case where they try to harvest what bank you have access to the attacker doesn't really learn much from a DNS failure. It's the successful case, like either a 200 or 401 response that they would be interested in to see if you were currently logged into Bank of America.

Can we assert then that there are two different error classes that require two different levels of authorization to allow error logging to occur:

1.      Infrastructure related errors - For cases of complete failure, like DNS resolution failures, TCP connection failures or corrupted HTTP, basically where no valid HTTP status code is returned (maybe 500's in here), that it's likely an infrastructure error and doesn't really reveal anything about the habits of the end user. In these cases, it's okay to skip the cross origin check and allow the base domain to see the details of the error on the resource. If this does include 500's, then it would also honor any cross origin headers that deny access. Thus if a site chooses to return a 500 intentionally, versus a real error that the server can't handle, it could deny access to error status information.



2.      Other errors - when it's anything else that was actually a valid HTTP response from the origin, then that origin has to include normal cross origin authorization headers to allow detailed errors to be seen by the base domain. If it's a valid response and it doesn't have cross origin allow headers we just return "Foo Domain had a Generic Error" or don't allow it at all.

This would leverage the existing cross origin logic during valid responses, thus protecting privacy, and still allow the not-explicitly-authorized-but-low-privacy-risk infrastructure errors to be collected without inventing a new scheme.

Aaron



From: Jatinder Mann [mailto:jmann@microsoft.com]
Sent: Tuesday, April 16, 2013 4:46 PM
To: James Simonsen; public-web-perf@w3.org
Subject: RE: Web Request Status Codes

Considering the level of detailed information we're considering exposing, I think it's quite reasonable that we should run this proposed interface by our security and privacy teams first. I will schedule a similar review with the IE Security team and get back to the working group.

Thanks,
Jatinder

From: James Simonsen [mailto:simonjam@google.com]
Sent: Thursday, April 11, 2013 2:01 PM
To: public-web-perf@w3.org<mailto:public-web-perf@w3.org>
Subject: Re: Web Request Status Codes

Third party errors are absolutely off limits unless we receive explicit permission to report them. Without succeeding with the HTTP request, we don't have that permission. Otherwise, sites can figure out which bank a user uses by requesting third party resources from all of the banks and seeing which report errors.

Additionally, we are concerned that our users will be fingerprinted by malicious sites. Exposing additional information makes those attacks much easier.

I've requested review from the Chrome privacy and security teams. I don't think we should bother discussing Error Logging any further until everyone else does the same.

James

On Thu, Apr 11, 2013 at 11:44 AM, Austin,Daniel <daaustin@paypal-inc.com<mailto:daaustin@paypal-inc.com>> wrote:
Hi James,

                Thanks for the feedback. I appreciate your taking the time to look at this.  However, I'm not yet convinced that there is any privacy/security concern here. My reasoning goes like this:


a)      There are a large number of companies doing this already, including Google (Analytics), Yahoo! (Roundtrip and Y! Analytics), Omniture (SiteCatalyst), Mediaplex (Analytics), Compuware/Gomez (RUM), and many others. These services regularly provide collection and transport for this same data and send it upstream, often to a 3rd party (which is worse IMHO). We're not exposing anything that others are not already doing, we're just institutionalizing it and giving the user some control. I can certainly see 304's, 200 (cache) responses, and proxies in that data. Presumably these companies privacy policies already alert the user about all of this, and the user has provided consent by viewing the page. (This isn't an argument about right or wrong, but about current industry practice.)



b)      Users can see all of this data already, by pressing F12 or similar, so it's not concealed from the user and then exposed to others. The data isn't terribly useful to end users (unless they're performance geeks) but it's not secret.



On the cross-origin issue, I think there's something I'm not understanding. Why would cross-origin requests not be logged by the client? For this data to be useful we need to know what happened when the page loaded, regardless of the source. If I put an analytics tag in my page, for example,  and it fails for some reason, I need to know about it, and omitting the error codes is the opposite of helping.



3rd party calls are very often the source of performance problems on the page, and the client, IMHO, should provide full information about everything that happens in all the HTTP request/response cycles that went into that page's composition. In today's world, nearly every page published by any commercial organization is likely to have some 3rd party content.



The more I think about this the more I think the right path is to provide detailed information for everything and be transparent about it all.



Regards,



D-



From: James Simonsen [mailto:simonjam@google.com<mailto:simonjam@google.com>]
Sent: Wednesday, April 10, 2013 2:58 PM
To: public-web-perf@w3.org<mailto:public-web-perf@w3.org>
Subject: Re: Web Request Status Codes

Exposing HTTP status codes exposes a lot of information that hasn't been exposed before. For instance, there are codes that explicitly reveal the existence of a proxy and whether or not a resource is cached. We haven't exposed this sort of information before.

Before getting too far ahead of ourselves, I think we need to have a thorough security and privacy review about whether it's safe to expose this level of information. Otherwise, we're just wasting time discussing this.

Separately, note that the DNS and TCP (and possibly many HTTP) errors are useless for cross-origin requests, because there's no way to determine if logging is allowed.

James

On Mon, Apr 8, 2013 at 3:48 PM, Austin,Daniel <daaustin@paypal-inc.com<mailto:daaustin@paypal-inc.com>> wrote:
Hi Team,
                I've attached to this email an HTML file with the current list of Web Request Status Codes. This list includes all of the status codes that I've been able to track down, with some exceptions. There are a great many of them. Here's a breakdown of the process and the decisions I made to produce the current list:

*         Some status codes were omitted for being ridiculous (418, 420)

*         Some status codes returned by existing servers but not part of any RFC are still listed in red - I don't think they belong here (possibly with the exception of 509) but I've left them in for discussion purposes.

*         Non-HTTP status codes have been added. There are a lot of them (around 40). Since RFC 2616 clearly specifies that HTTP status codes have 3 digits, I've begun the numbering for non-HTTP status codes at 1000. These status codes are broken down by their level in the OSI stack and namespaced accordingly e.g. 1207 SSL: Cipher Error as opposed to 1109 TCP: No route to host. There are four groups of these, namespaced as DNS:, TCP:, SSL:, HTTP:, and Client: . The HTTP: status codes are not currently included in RFC 2616 or any of the other specs, but are common errors seen by clients e.g. 1302 HTTP: Header malformed. Perhaps 'HTTP server:' is better?

*         I've included a key to the different RFCs that contain HTTP status codes. There are 13 (!) of them, and 2 status codes are in draft proposals, linked in the document.

*         For any status code not included in RFC 2616, I've tried to provide a rationale for its existence.

*         Color codes: black = RFC 2616, blue = new for this spec or repurposed from some proprietary list, red = proprietary and doesn't belong here IMHO

*         Sources: RFC 2616, other RFCs and drafts as listed, Wikipedia, Stack Overflow, MSFT sites, Compuware/Gomez, KeyNote, Catchpoint, Nginx, Apache

*         For completeness, I've included all status codes received by the client, not just the error codes. There are several that are not in RFC 2616.

*         I took the liberty of repurposing some existing-but-nonstandard codes and renumbering them for our purposes. I've tried to indicate the source e.g. (Nginx)
Here's the next steps as I see them:

*         Agree on a more-or-less final list of status codes, correct any omissions or duplicates

*         Move this table into Jatinder's spec (or maybe a separate Note?)
This task took considerably more time and effort than I had expected. Who knew there were so many status codes ?
Regards,
D-
Received on Wednesday, 5 June 2013 14:52:09 UTC