- From: Willy Tarreau <w@1wt.eu>
- Date: Sat, 9 Feb 2013 16:05:13 +0100
- To: Poul-Henning Kamp <phk@phk.freebsd.dk>
- Cc: Martin J. Dürst <duerst@it.aoyama.ac.jp>, Mark Nottingham <mnot@mnot.net>, James M Snell <jasnell@gmail.com>, "ietf-http-wg@w3.org" <ietf-http-wg@w3.org>
On Sat, Feb 09, 2013 at 02:04:30PM +0000, Poul-Henning Kamp wrote: > Content-Type: text/plain; charset=ISO-8859-1 > -------- > In message <20130209133341.GA8712@1wt.eu>, Willy Tarreau writes: > > >I'm not saying I'm totally against UTF-8 in HTTP/2 [...] > > What and where do you mean when you say "UTF-8" In HTTP/2 ? > > I think we need to be more precise, to avoid misunderstandings. > > In HTTP/1, there is a peculiar mix between protocol-mechanics, and > metadata: If I add a custom bit of metadata, it must follow certain > rules, since otherwise it will break the protocol mechanics. > > For instance, I cannot define a custom header called: > > "FOO" CRNL CRNL ": " [8 zero bytes] > > If we define HTTP/2 as a "binary" protocol in some sensible way, > this restriction could go away, and we'd just move something like: > > <HDR nlen=7,blen=8> "FOO" CRNL CRNL \0\0\0\0\0\0\0\0 > > down the wire, and not care about what it is, what it means or > what character set, if any, it is encoded in. > > It is only the metadata that needs inspection along the way where > we need to decide about UTF-8, and it really isn't that much. Prefixing values with their lengths generally is the most efficient way to work (CPU-wise). > Host: > Why would we care about the character set ? We're > just going to pass it to DNS anyway. > > URI: > At least the query strings, possibly all of it ? > But do we really care ? Provided we take the Host > part out, as proposed, we treat this as a unit. I'd be cautious about mixing URI and query strings, I see too often people rewrite some requests to move the question mark away and replace it with a slash. Then they don't realize they're possibly mixing two distinct encodings, still they do! > Cache-Control: > And what good would UTF-8 do here in the first place ? No need, we need to use tokens here and tokens can be an enum. > So where is it you want UTF-8, and what difference will it make ? Hey Poul-Henning, please do not put words in my mouth, I'm not saying I want UTF-8, OK ? As I said, I don't like this encoding at all. What I'm saying is that if we have to transport such encoded data, I prefer that we pass it as-is in its original form than having to decode/encode it. For example, if it becomes a norm that URI, Location or Referer is UTF8-encoded, let's pass them untransformed. But in general, I think that 20 years of web have shown that the protocol does not need this at all to succeed. Regards, Willy
Received on Saturday, 9 February 2013 15:05:48 UTC