RE: Web Request Status Codes from Aaron Heady (BING) on 2013-04-12 (public-web-perf@w3.org from April 2013)

From: Aaron Heady (BING) <aheady@microsoft.com>
Date: Fri, 12 Apr 2013 16:23:15 +0000
To: Mark Nottingham <mnot@mnot.net>, James Simonsen <simonjam@google.com>
CC: "public-web-perf@w3.org" <public-web-perf@w3.org>
Message-ID: <a957174029e846b08dd876c65bd9a5af@BLUPR03MB067.namprd03.prod.outlook.com>
> Third party errors are absolutely off limits unless we receive explicit permission to report them. Without succeeding with the HTTP request, we don't have that permission. Otherwise, sites can figure out which bank a user uses by requesting third party resources from all of the banks and seeing which report errors.

>>Agreed 100%.

Given that we cannot get an allow header on a completely failed request, and that seeing the cross origin resource failures is of immense value to legitimate service operators, can the concept of the CORS allow headers be extended to include something like a cache TTL. We really expect this feature to work because we have active users that will see an error and then visit us again in a short time period which will allow us to retrieve the errors. If the cross origin allow header were honored for more than just an individual response then the user agent could already know that it's okay to see the error codes for contoso.com resources rendered on example.com. Then when an actual error occurs rendering contoso.com/ad1.js the existing cached allow header can be checked and allow example.com to see the details of the contoso.com resource error.

Basically keep all of the semantics of cross origin checks, just add an optional expires header to the tag. Or to allow complete backwards compatibility, a separate max-age header like:

Access-Control-Allow-Origin: http://hello-world.example
Access-Control-Max-Age: 3628800

From: http://www.w3.org/TR/cors/#introduction

Aaron



-----Original Message-----
From: Mark Nottingham [mailto:mnot@mnot.net] 
Sent: Thursday, April 11, 2013 4:47 PM
To: James Simonsen
Cc: public-web-perf@w3.org
Subject: Re: Web Request Status Codes


On 12/04/2013, at 7:00 AM, James Simonsen <simonjam@google.com> wrote:

> Third party errors are absolutely off limits unless we receive explicit permission to report them. Without succeeding with the HTTP request, we don't have that permission. Otherwise, sites can figure out which bank a user uses by requesting third party resources from all of the banks and seeing which report errors.

Agreed 100%.



> Additionally, we are concerned that our users will be fingerprinted by malicious sites. Exposing additional information makes those attacks much easier.

That's very reasonable as a position statement. However, it doesn't help decide whether to add new features, because taken at face value, we wouldn't add *any* features to the Web platform ("information" is unavoidably broad, after all).

In this instance, can you explain how adding response status codes adds potential bits to a fingerprint (assuming same-origin constraints)? I agree there could be some convoluted scheme whereby a cached, non-standard status code could add a bit or two, but that doesn't seem relevant, when you consider that a cached ETag is a much simpler, already available and more capable way to do so.


> I've requested review from the Chrome privacy and security teams. I don't think we should bother discussing Error Logging any further until everyone else does the same.

Great. When will their review be complete? Will the results be shared with the WG?  (just trying to understand the process you're proposing)

WRT the specifics of the proposal - why are we enumerating the status codes? Why not just refer to the registry <http://www.iana.org/assignments/http-status-codes/http-status-codes.xml> and report the three-digit code?

Cheers,


> 
> James
> 
> 
> On Thu, Apr 11, 2013 at 11:44 AM, Austin,Daniel <daaustin@paypal-inc.com> wrote:
> Hi James,
> 
>  
> 
>                 Thanks for the feedback. I appreciate your taking the time to look at this.  However, I'm not yet convinced that there is any privacy/security concern here. My reasoning goes like this:
> 
>  
> 
> a)      There are a large number of companies doing this already, including Google (Analytics), Yahoo! (Roundtrip and Y! Analytics), Omniture (SiteCatalyst), Mediaplex (Analytics), Compuware/Gomez (RUM), and many others. These services regularly provide collection and transport for this same data and send it upstream, often to a 3rd party (which is worse IMHO). We're not exposing anything that others are not already doing, we're just institutionalizing it and giving the user some control. I can certainly see 304's, 200 (cache) responses, and proxies in that data. Presumably these companies privacy policies already alert the user about all of this, and the user has provided consent by viewing the page. (This isn't an argument about right or wrong, but about current industry practice.)
> 
>  
> 
> b)      Users can see all of this data already, by pressing F12 or similar, so it's not concealed from the user and then exposed to others. The data isn't terribly useful to end users (unless they're performance geeks) but it's not secret.
> 
>  
> 
> On the cross-origin issue, I think there's something I'm not understanding. Why would cross-origin requests not be logged by the client? For this data to be useful we need to know what happened when the page loaded, regardless of the source. If I put an analytics tag in my page, for example,  and it fails for some reason, I need to know about it, and omitting the error codes is the opposite of helping.
> 
>  
> 
> 3rd party calls are very often the source of performance problems on the page, and the client, IMHO, should provide full information about everything that happens in all the HTTP request/response cycles that went into that page's composition. In today's world, nearly every page published by any commercial organization is likely to have some 3rd party content.
> 
>  
> 
> The more I think about this the more I think the right path is to provide detailed information for everything and be transparent about it all.
> 
>  
> 
> Regards,
> 
>  
> 
> D-
> 
>  
> 
>  
> 
>  
> 
> From: James Simonsen [mailto:simonjam@google.com] 
> Sent: Wednesday, April 10, 2013 2:58 PM
> To: public-web-perf@w3.org
> Subject: Re: Web Request Status Codes
> 
>  
> 
> Exposing HTTP status codes exposes a lot of information that hasn't been exposed before. For instance, there are codes that explicitly reveal the existence of a proxy and whether or not a resource is cached. We haven't exposed this sort of information before.
> 
>  
> 
> Before getting too far ahead of ourselves, I think we need to have a thorough security and privacy review about whether it's safe to expose this level of information. Otherwise, we're just wasting time discussing this.
> 
>  
> 
> Separately, note that the DNS and TCP (and possibly many HTTP) errors are useless for cross-origin requests, because there's no way to determine if logging is allowed.
> 
>  
> 
> James
> 
>  
> 
> On Mon, Apr 8, 2013 at 3:48 PM, Austin,Daniel <daaustin@paypal-inc.com> wrote:
> 
> Hi Team,
> 
>                 I've attached to this email an HTML file with the current list of Web Request Status Codes. This list includes all of the status codes that I've been able to track down, with some exceptions. There are a great many of them. Here's a breakdown of the process and the decisions I made to produce the current list:
> 
> *         Some status codes were omitted for being ridiculous (418, 420)
> 
> *         Some status codes returned by existing servers but not part of any RFC are still listed in red - I don't think they belong here (possibly with the exception of 509) but I've left them in for discussion purposes.
> 
> *         Non-HTTP status codes have been added. There are a lot of them (around 40). Since RFC 2616 clearly specifies that HTTP status codes have 3 digits, I've begun the numbering for non-HTTP status codes at 1000. These status codes are broken down by their level in the OSI stack and namespaced accordingly e.g. 1207 SSL: Cipher Error as opposed to 1109 TCP: No route to host. There are four groups of these, namespaced as DNS:, TCP:, SSL:, HTTP:, and Client: . The HTTP: status codes are not currently included in RFC 2616 or any of the other specs, but are common errors seen by clients e.g. 1302 HTTP: Header malformed. Perhaps 'HTTP server:' is better?
> 
> *         I've included a key to the different RFCs that contain HTTP status codes. There are 13 (!) of them, and 2 status codes are in draft proposals, linked in the document.
> 
> *         For any status code not included in RFC 2616, I've tried to provide a rationale for its existence.
> 
> *         Color codes: black = RFC 2616, blue = new for this spec or repurposed from some proprietary list, red = proprietary and doesn't belong here IMHO
> 
> *         Sources: RFC 2616, other RFCs and drafts as listed, Wikipedia, Stack Overflow, MSFT sites, Compuware/Gomez, KeyNote, Catchpoint, Nginx, Apache
> 
> *         For completeness, I've included all status codes received by the client, not just the error codes. There are several that are not in RFC 2616.
> 
> *         I took the liberty of repurposing some existing-but-nonstandard codes and renumbering them for our purposes. I've tried to indicate the source e.g. (Nginx)
> 
> Here's the next steps as I see them:
> 
> *         Agree on a more-or-less final list of status codes, correct any omissions or duplicates
> 
> *         Move this table into Jatinder's spec (or maybe a separate Note?)
> 
> This task took considerably more time and effort than I had expected. Who knew there were so many status codes ?
> 
> Regards,
> 
> D-
> 
>  
> 
>  
> 
> 

--
Mark Nottingham   http://www.mnot.net/
Received on Friday, 12 April 2013 16:26:11 UTC