RE: draft-montenegro-httpbis-uri-encoding from Gabriel Montenegro on 2014-03-22 (ietf-http-wg@w3.org from January to March 2014)

From: Gabriel Montenegro <Gabriel.Montenegro@microsoft.com>
Date: Sat, 22 Mar 2014 05:40:50 +0000
To: Mark Nottingham <mnot@mnot.net>, Zhong Yu <zhong.j.yu@gmail.com>, "Dave Thaler" <dthaler@microsoft.com>, Osama Mazahir <OSAMAM@microsoft.com>, "Matthew Cox" <macox@microsoft.com>
CC: "Julian F. Reschke" <julian.reschke@gmx.de>, HTTP Working Group <ietf-http-wg@w3.org>
Message-ID: <6d7fc6a77edd438ca1f79bd2448fae0e@BLUPR03MB066.namprd03.prod.outlook.com>

> From: Mark Nottingham [mailto:mnot@mnot.net] On 22 Mar 2014, at 3:00 
> am, Zhong Yu <zhong.j.yu@gmail.com> wrote:
> >
> > Yes. The issue addressed by Gabriel's draft seems to be in the scope 
> > of HTML, not HTTP.
> >
> > And even if some kind of mechanism of declaring query encoding 
> > becomes official, any intermediary relying on it will have a bad time.

Any intermediary that relies on it is broken. This is a header that is not at all guaranteed to be present. 

> Reading this thread, I was starting to think the same thing.
> 
> In particular, this seems like something that needs to be coupled to 
> *where* the link originates; e.g., a browsers' behaviour for a link 
> from an address bar is likely to be different than that from an 'a' 
> tag, and even again different from a JavaScript-generated link.

I'm a bit confused by this. Determining whether to set the headers, and what to set them to (e.g., UTF-8) is certainly browser behavior. But once that decision has been taken, communicating that hint to the server is what the headers are for.

> Gabriel, have you brought it up over at the W3C or in the WHATWG?

No, I haven't.

BTW, here's some responses to the very long thread.

To respond to Nicolas Mailhot <nicolas.mailhot@gmail.com> about fixing it for HTTP/2, on  http://lists.w3.org/Archives/Public/ietf-http-wg/2014JanMar/1110.html: This is not an HTTP/2 issue any more. That issue was closed in the issue tracker, and instead the decision at the Zurich interim was to address this as a generic HTTP issue (for both HTTP 1.X and 2), hence the generic header proposal. 

There is some concern from Amos and others about non-UTF-8 values of the header and how to handle them. The current language in the draft says to treat that as the legacy case:

	"...invalid value or unrecognized charset, this is equivalent
	   to the legacy situation of non-determinism..."

So any intermediary or server that wishes to only accept UTF-8 can treat anything else as "unrecognized" which amounts to ignoring the header altogether. We could tighten the spec to only allow UTF-8, but given the above clause that appears to be an unnecessary constraint.  

>From Julian:

	Practically, how is a UA supposed to *know* the encoding that was used for the URI *unless' it constructed it itself? (Which is not what browsers do; they only construct the query part).

If you don't know for sure, then don't use the header. But if you know for sure, it's useful to indicate this fact by using the headers to tighten parsing at the other side. Notice that a malicious agent would have incentive to *not* use the header so as to continue exploiting the legacy situation. Using the header imposes constraints that make it harder to exploit the current situation of non-determinism. 

We happen to know often enough that these headers would be useful in our case (and this is why we're proposing these headers: we'd use them if available). Others have chimed in along those lines as well, but each one decides. Even if it's only useful in the query case for some situations, one can use the query header and not the path header. They are independent headers for that reason.

Gabriel

Received on Saturday, 22 March 2014 05:41:22 UTC