Re: Request Routing Information [was: Do we kill the "Host:" header in HTTP/2 ?] from Amos Jeffries on 2013-02-07 (ietf-http-wg@w3.org from January to March 2013)

From: Amos Jeffries <squid3@treenet.co.nz>
Date: Thu, 07 Feb 2013 14:53:44 +1300
To: ietf-http-wg@w3.org
Message-ID: <51130928.7030802@treenet.co.nz>
On 7/02/2013 11:44 a.m., Adrien W. de Croy wrote:
>
>
> ------ Original Message ------
> From: "Mark Nottingham" <mnot@mnot.net>
> To: "James M Snell" <jasnell@gmail.com>
> Cc: "HTTP Working Group" <ietf-http-wg@w3.org>
> Sent: 5/02/2013 6:16:47 p.m.
> Subject: Request Routing Information [was: Do we kill the "Host:" 
> header in HTTP/2 ?]
>> Thanks for making concrete proposals, James -- that's helpful.
>>
>> We had a brief conversation at the F2F about requiring "special" 
>> headers (e.g., :scheme :method :host :path) to be at the beginning of 
>> the set of headers.
>>
>> That's effectively a different serialisation of the information here 
>> (ignoring the separation of the port). Each approach has advantages 
>> and disadvantages, but what might help us move forward here is first 
>> figuring out *what* information needs to be separated out, before we 
>> talk about the specific format of the bits on the wire.
>>
>> A few points to consider (trying to move the conversation forward, 
>> more than stating a position):
>>
>> * HTTP/1.1 has two ways of serialising what we call the Effective 
>> Request URI in HTTPbis, and I don't think it's too controversial to 
>> say that this is bad, and in /2 we should just have one way to do it.
>
> fine with that as long as it's clear whether a client is talking 
> proxy-ese or not.  Esp if you consider intercepted connections may be 
> in the mix.  In fact explicit support for clients / servers / proxies 
> to know the connection is intercepted would be good.
>
>>
>> * One of the HTTP/1.1 forms omits the scheme in use. Discussion so 
>> far seems to imply that people want the scheme to be explicit in /2. 
>> Anyone have any argument as to why not?
> Apart from proxy requests, the scheme has always been http.  You can 
> only make a ftp:// request to a proxy.  https:// was never used.  So 
> there was only http when talking directly to a server. Personally 
> though I would propose putting the scheme in always, to enable things 
> like semantic equivalent of GET https://some-secure-site.com/whatever
>

Exactly. If we are to multiplex HTTP, HTTPS, WebSockts and possibly 
other future protcols using the HTTP frame base format. Which seems like 
a reasonable ability. Then we are going to need to specify those schemes 
in the URL, to indicate for example what protocol frame format the 
client expects used on the response. https:// to indicate encrypted 
payload is just the current testable variant on http://.

In fact HTTP/2 frames with encrypted payload opens us to that wonderful 
world of shttp:// scheme. With all the security available in https:// 
but routing flexibility and proxy handling available in http://


>>
>> * If we do make the scheme explicit, I'd note that HTTPbis allows use 
>> of schemes other than HTTP / HTTPS, so we'd need to accommodate that. 
>> I.e., a single bit is out.
>>
>> * Most people seem to see the value in separating the authority 
>> portion of the URI into a separate header, because that's routed upon 
>> (and it could also benefit from delta-based compression). Anyone 
>> disagree?
> nope, I'm in favour of that.  I would also split out port.
>
>>
>> * Separating the query string from the path would save the origin 
>> server a bit of parsing. I see arguments on both sides; who wants to 
>> make them?
> I would be in favour.  Lots of sites (e.g. sites running on a CMS 
> without mod_rewrite) are all calls to index.php with the only thing 
> changing being the query string.  So splitting them out would enable 
> us to save re-transmitting the path if it didn't change.

That would be a problem for the origin server caused by itself. Each of 
these URLs is a different object, meaning you would have to re-combine 
the two fields to perform things like cache lookups and validation.
Splitting them only benefits when compressing without prefix matching 
the field against earlier versions of itself - which if I am 
understanding CRME right is not a vulnerability (the problem being 
prefix matching field A with field B when one has secure data and the 
other is attacker modifiable).

Client, middlware, and routing infrastructure do not need to care about 
the path+query portion for their operations other than as an opaque 
blob. Splitting them just makes two opaque blobs - useless waste of 
processing one length-skip operation. And adds the '?' embedding 
vulnerabiities mentioned earlier. Not to mention some schemes like urn: 
where '?' does *not* signal a query string.

NP: I do know of situations where the proxies "need" to split path and 
query string. But, all of those cases are where it is performing 
origin-server operations on the request. Which just supports the idea of 
leaving these two split up to the origin server.


>
>>
>> * Request routing is generally done on the host/port tuple; i.e., the 
>> port doesn't have informational value *in the HTTP message* when it's 
>> separate from the port. So, I'm not sure about the value proposition 
>> of separating it out here; can you illustrate?
> proxies always need to parse this.  Is the proposal that we'd still 
> need to string parse server:port, or would there be some binary 
> encoding of server and port parts?
>
> I am assuming we plan to adopt default values for various fields, 
> which therefore never need to be transmitted unless the value differs 
> from the default.
>
> E.g.
>
> default values for
>
> scheme = http
> method = GET
> port = 80
>
> there could also be a case to allow a client to push fields to a peer 
> without there being a request.  for instance the browser on loading 
> could push things like user-agent, accept-encoding etc to a proxy 
> before issuing the first request.  Basically allow discovery and 
> advertisement of support for protocol functions at startup.

Assuming that we can agree on suitable enumerations for these details, 
they can be fixed-length field and there is no extra cost of them being 
always present, even when 'default'. Defaults become a property inferred 
by the clients, not by any middleware or servers - which themselves may 
not be coded up to handle the hard-coded future scheme foo:// default 
port assignment.

>>
>> * We'll need to do all of this for the response status code as well. 
>> Maybe not the phrase; we touched on this briefly at the F2F, and I 
>> put forth the opinion that since it's human-readable, and our message 
>> format isn't really any more, it doesn't have much utility to 
>> actually include in the message. Anyone think it's useful enough to 
>> justify the bits?
> could be optional.  You can still see the text in packet captures, and 
> logs so it can still be useful.

Yes +1 on optional phrase. It can be a header as easily as anything else 
and saves a lot of bytes by sending just status code even before appying 
a compact binary format on status.

>
>> * We also talked about :version at the F2F, both in requests and 
>> responses. I don't think it's necessary, as it's effectively 
>> hop-by-hop information, and the connection negotiation + magic takes 
>> care of that. Discuss.
>
> May be more useful to have a field to indicate the version that the 
> request came from.  E.g. if a HTTP/1.1 request was made to a proxy 
> which up-graded it to a 2.0 server, the server may need the original 
> client version to decide what functionality to enable.
>
> Via doesn't quite cut this, since it omits the original version.
>
> Again, it could default to 2.0, and only be present if it's different.


What about an upgraded version of Via which:
  * omits the () comments portion
  * only added/altered *if* the version has changed since the last hop.

+ backward compatible with HTTP/1.1 for 1.1<->2.0 gateway which 
permitted any hop to 'compact' multiple entries if they shared a server 
name and treated it as optional on client->origin traffic.
+ caters to user security better by only giving indication of protocol 
gateways, nothing deterministic about the end-user or end-server.

Amos
Received on Thursday, 7 February 2013 01:54:14 UTC