- From: Willy Tarreau <w@1wt.eu>
- Date: Wed, 7 Nov 2018 06:53:29 +0100
- To: Piotr Sikora <piotrsikora@google.com>
- Cc: HTTP Working Group <ietf-http-wg@w3.org>, Amos Jeffries <squid3@treenet.co.nz>, jasnell@gmail.com, mbishop@evequefou.be, Jeffrey Yasskin <jyasskin@google.com>, jason.greene@redhat.com
On Tue, Nov 06, 2018 at 08:58:03PM -0800, Piotr Sikora wrote: > Reviving this thread now that we have HTTP Core, with semantics > separated from HTTP/1.1 and HTTP/2 messaging. > > As of right now, the header values at the semantics layer are limited > to the visible characters: > > field-value = *( field-content / obs-fold ) > field-content = field-vchar [ 1*( SP / HTAB ) field-vchar ] > field-vchar = VCHAR / obs-text > > where: > > VCHAR = %x21-7E > obs-text = %x80-FF > > I'm fine with the existing restrictions being enforced at the HTTP/1.1 > messaging layer, where binary values could be converted according to > the "Byte Sequence" rules from Structured Headers, however both HTTP/2 > and HTTP/3 are perfectly capable of transmitting all octets, so the > semantics layer shouldn't be limited by the fact that HTTP/1.x is a > text-based protocol. > > If this is published as-is, it's going to prevent use of binary values > in header fields "for the rest of our careers" (according to the WG > chairs), so I guess it's "now or never" kind of thing. > > Thoughts? While I can see the value in doing this for having had to deal with value encoding, I also see some obvious problems with it like transcoding to older HTTP versions. Despite this I think it deserves some thinking. My primary concern really is about the risk that such header fields get transcoded to H1 and cause huge damage, even where unexpected by the persons deploying a gateway for instance. One idea against this could be that we introduce a new header field (please don't beat me) to indicate if a message may be downgraded and if so, till what version. We could for example have "Requires: HTTP/2" in an H3 message to indicate that the message cannot be conveyed over HTTP versions older than 2. This could be useful over the long term to transport other semantics that were possibly ambigiuous before certain versions. In this case a message conveying binary data would be expected to pass this "requires: h2" field to make sure a gateway doesn't pass it over an older version. And in my opinion it's this signal that needs to be defined early and before we generalize H{3,2,1} <-> H{3,2,1} gateways. We could then decide that certain new header fields must be watched and obeyed by gateways supporting certain versions. We already have this issue with some protocol elements introduced in H2 like the "never index" fields in HPACK. It's the reason we've had to completely redesign the internal HTTP stack in haproxy, because for now H2 messages are translated to HTTP/1.1 but it's not possible to keep this type of information there if we need to re-encode to H2. Another important element to keep in mind is the list delimiter. The HTTP spec says that a header field may appear multiple times in a message if and only if it's defined as a comma-delimited list (with an exception for set-cookie which we all love). With your proposal to allow all characters and to pass binary data, as soon as a header field appears multiple times, there will definitely be agents which will fold the values by appending a comma and a space and this will break your contents. So we'd probably need to define how such header fields should (not?) be folded and which ones it applies to. Finally, agents are free to trim leading and trailing LWS in values. Here again it will destroy your contents, so we also need to take care of this. For all these reasons I'm starting to suspect that we'll sooner or later have to introduce a notion of properties associated with header fields. One of them could be "binary", which implies no trimming, no folding. Another one could be "do-not-fold" for set-cookie and possibly others (i.e. all those supposed to contain a comma like Date or Expires). We could imagine having a "binary: <name-list>" field passing the list of binary header names, but it really is a pain to deal with parsers relying on names found in other header fields. All such properties could be defined in the core with their default values (e.g. no-fold for "set-cookie"), and certain HTTP versions could be able to override the default properties, for instance to specify that a given field is of type binary and must not be mangled. And it's only with the minimum required version signal that you can make sure that all elements along the chain will respect the promise not to touch it. Regards, Willy
Received on Wednesday, 7 November 2018 05:54:23 UTC