Re: Request Routing Information [was: Do we kill the "Host:" header in HTTP/2 ?] from Adrien W. de Croy on 2013-02-06 (ietf-http-wg@w3.org from January to March 2013)

From: Adrien W. de Croy <adrien@qbik.com>
Date: Wed, 06 Feb 2013 22:44:09 +0000
To: "Mark Nottingham" <mnot@mnot.net>, "James M Snell" <jasnell@gmail.com>
Cc: "HTTP Working Group" <ietf-http-wg@w3.org>
Message-Id: <em311c9408-db6c-4d28-9db4-499e05200485@bombed>
------ Original Message ------
From: "Mark Nottingham" <mnot@mnot.net>
To: "James M Snell" <jasnell@gmail.com>
Cc: "HTTP Working Group" <ietf-http-wg@w3.org>
Sent: 5/02/2013 6:16:47 p.m.
Subject: Request Routing Information [was: Do we kill the "Host:" header 
in HTTP/2 ?]
>Thanks for making concrete proposals, James -- that's helpful.
>
>We had a brief conversation at the F2F about requiring "special" 
>headers (e.g., :scheme :method :host :path) to be at the beginning of 
>the set of headers.
>
>That's effectively a different serialisation of the information here 
>(ignoring the separation of the port). Each approach has advantages and 
>disadvantages, but what might help us move forward here is first 
>figuring out *what* information needs to be separated out, before we 
>talk about the specific format of the bits on the wire.
>
>A few points to consider (trying to move the conversation forward, more 
>than stating a position):
>
>* HTTP/1.1 has two ways of serialising what we call the Effective 
>Request URI in HTTPbis, and I don't think it's too controversial to say 
>that this is bad, and in /2 we should just have one way to do it.

fine with that as long as it's clear whether a client is talking 
proxy-ese or not.  Esp if you consider intercepted connections may be in 
the mix.  In fact explicit support for clients / servers / proxies to 
know the connection is intercepted would be good.

>
>* One of the HTTP/1.1 forms omits the scheme in use. Discussion so far 
>seems to imply that people want the scheme to be explicit in /2. Anyone 
>have any argument as to why not?
Apart from proxy requests, the scheme has always been http.  You can 
only make a ftp:// request to a proxy.  https:// was never used.  So 
there was only http when talking directly to a server.  Personally 
though I would propose putting the scheme in always, to enable things 
like semantic equivalent of GET https://some-secure-site.com/whatever

>
>* If we do make the scheme explicit, I'd note that HTTPbis allows use 
>of schemes other than HTTP / HTTPS, so we'd need to accommodate that. 
>I.e., a single bit is out.
>
>* Most people seem to see the value in separating the authority portion 
>of the URI into a separate header, because that's routed upon (and it 
>could also benefit from delta-based compression). Anyone disagree?
nope, I'm in favour of that.  I would also split out port.

>
>* Separating the query string from the path would save the origin 
>server a bit of parsing. I see arguments on both sides; who wants to 
>make them?
I would be in favour.  Lots of sites (e.g. sites running on a CMS 
without mod_rewrite) are all calls to index.php with the only thing 
changing being the query string.  So splitting them out would enable us 
to save re-transmitting the path if it didn't change.

>
>* Request routing is generally done on the host/port tuple; i.e., the 
>port doesn't have informational value *in the HTTP message* when it's 
>separate from the port. So, I'm not sure about the value proposition of 
>separating it out here; can you illustrate?
proxies always need to parse this.  Is the proposal that we'd still need 
to string parse server:port, or would there be some binary encoding of 
server and port parts?

I am assuming we plan to adopt default values for various fields, which 
therefore never need to be transmitted unless the value differs from the 
default.

E.g.

default values for

scheme = http
method = GET
port = 80

there could also be a case to allow a client to push fields to a peer 
without there being a request.  for instance the browser on loading 
could push things like user-agent, accept-encoding etc to a proxy before 
issuing the first request.  Basically allow discovery and advertisement 
of support for protocol functions at startup.
>
>* We'll need to do all of this for the response status code as well. 
>Maybe not the phrase; we touched on this briefly at the F2F, and I put 
>forth the opinion that since it's human-readable, and our message 
>format isn't really any more, it doesn't have much utility to actually 
>include in the message. Anyone think it's useful enough to justify the 
>bits?
could be optional.  You can still see the text in packet captures, and 
logs so it can still be useful.

>* We also talked about :version at the F2F, both in requests and 
>responses. I don't think it's necessary, as it's effectively hop-by-hop 
>information, and the connection negotiation + magic takes care of that. 
>Discuss.

May be more useful to have a field to indicate the version that the 
request came from.  E.g. if a HTTP/1.1 request was made to a proxy which 
up-graded it to a 2.0 server, the server may need the original client 
version to decide what functionality to enable.

Via doesn't quite cut this, since it omits the original version.

Again, it could default to 2.0, and only be present if it's different.

Adrien


>
>Cheers,
>
>
>
>On 02/02/2013, at 4:22 AM, James M Snell <jasnell@gmail.com> wrote:
>
>>  Based on the feedback, we can change this to...
>>
>>  +------------------------------+
>>  |S|len(method)|method|len(host)|
>>  +-+----+----+-+-------+--------+
>>  | host |port|len(path)| path |
>>  +------------------------------+
>>
>>  The only change here really is the port field as a uvarint. The path 
>>would contain the full query-string and path detail...
>>
>>  Example: GET https://example.com:443/foo?a=b
>>
>>    [83,G,E,T,0B,e,x,a,m,p,l,e,.,c,o,m,BB,03,08,/,f,o,o,?,a,=,b]
>>
>>  If some other scheme needs to be specified, a separate :scheme header 
>>would be specified identifying the scheme.
>>
>>  Method names are encoded as text because snumeric identifier or 
>>abbreviation schemes are just going to add complexity for no 
>>demonstrated benefit.
>>
>>  - James
>>
>>
>>
>>  On Thu, Jan 31, 2013 at 11:09 PM, James M Snell <jasnell@gmail.com> 
>>wrote:
>>  One proposal on this particular topic...
>>
>>  We can combine the :scheme, :method, :host and :path header fields 
>>into a single :req Header with a compact binary encoding and require 
>>that this single header always appear first in request header blocks.
>>
>>  +------------------------------+
>>  |S|len(method)|method|len(host)|
>>  +-+-------+----+---------+-----+
>>  | host | len(path) | path |
>>  +------------------------------+
>>
>>  S = Single bit, when set, scheme = https, when not set, scheme = http
>>  len(method) = 7 bit length of method name
>>  method = name of method
>>  len(host) = uvarint(host)
>>  host = host
>>  len(path) = uvarint(path)
>>  path = path
>>
>>  Examples:
>>
>>  GET request for http://example.net/foo would be:
>>
>>    [03,G,E,T,0B,e,x,a,m,p,l,e,.,c,o,m,04,/,f,o,o]
>>
>>  POST request for https://example.net/foo would be:
>>
>>    [84,P,O,S,T,0B,e,x,a,m,p,l,e,.,c,o,m,04,/,f,o,o]
>>
>>  Unregistered method 'FOO' request for https://example.net/foo would 
>>be:
>>
>>    [83,F,O,O,0B,e,x,a,m,p,l,e,.,c,o,m,04,/,f,o,o]
>>
>>  If the client chooses not to send a host (because it's not needed or 
>>whatnot) they simply set len(host) to 0...
>>
>>    [03,G,E,T,00,04,/,f,o,o]
>>
>>  If we do end up going with delta encoding for compression, we can 
>>require that the :req Header always be passed using the eref operation 
>>(ephemeral reference, meaning that the header is never stored in the 
>>compression state). No huffman-coding would be applied to the header, 
>>making it very quick and cheap to process.
>>
>>  Thoughts?
>>
>>  - James
>>
>>
>>
>>  On Thu, Jan 31, 2013 at 8:04 AM, Ted Hardie <ted.ietf@gmail.com> 
>>wrote:
>>  On Thu, Jan 31, 2013 at 5:15 AM, Roy T. Fielding about this
>>
>>  >> We had no idea how early we were in the popularity curve of HTTP 
>>or
>>  >> how dominant it would become, but it was clear even then that the
>>  >> protocol would be very, very common on the network. In retrospect, 
>>it
>>  >> is clear that we shouldn't have looked at the current installed
>>  >> base--we should have looked at what we expected eventual use would 
>>be.
>>  >> That makes "the earlier the better" clear.
>>  >
>>  > I think your memory is a bit hazy there ... HTTP passed all
>>  > application protocols other than email in 1995, and by that
>>  > time (Mar 1996) was roughly double email traffic, IIRC (this was
>>  > before email-based spam became common). That's why the WG
>>  > meeting contained a lot of people who had nothing to do
>>  > with developing the Web protocols---there was panic in the air.
>>  >
>>
>>  I think we can both agree it only got more popular from there. It may
>>  well have surpassed email by that discussion, but the hockey stick 
>>had
>>  a lot of run to go.
>>
>>  > I find it amusing that you think we could have proceeded in any
>>  > other way without relegating the IETF work to the garbage bin.
>>  >
>>
>>  It would have taken agreement from a lot of people, but the web
>>  community of the time could have decided then to rev the major
>>  protocol version for that change. That suggestion was made. The IETF
>>  could not have mandated that. The IETF *never* gets to dictate
>>  protocol changes--it didn't have police then and it doesn't now, but
>>  saying that the buy-in for that change didn't happen doesn't equate 
>>to
>>  saying it could not.
>>
>>  regards,
>>
>>  Ted
>>  >> For HTTP 2.0, where we can make non-backward compatible changes, I
>>  >> personally think the right thing to do is to drop the Host: header
>>  >> (that version shift is what we were waiting for 17 years ago, 
>>after
>>  >> all). If there are things folks are getting from side-effects of 
>>the
>>  >> Host header (e.g. proxy targeting), we put them into the bin of
>>  >> potential requirements for HTTP 2.0 *and create mechanisms to meet
>>  >> those needs*.
>>  >>
>>  >> I think Adrien's proposal for extensions to the host header makes
>>  >> clear that the need isn't perfectly met by the host header in any
>>  >> case, so mapping out the real aim and meeting that seems like the 
>>best
>>  >> notion to me.
>>  >
>>  > Oddly enough, waka separates the scheme+host routing information 
>>from
>>  > the rest of the URI because that works better with multi-argument
>>  > methods and message-based encryption. *shrug*
>>  >
>>  > ....Roy
>>  >
>>
>>
>>
>
>--
>Mark Nottingham http://www.mnot.net/
>
>
>
>
Received on Wednesday, 6 February 2013 22:45:07 UTC