Re: HTTP/2: allow binary data in header field values

On Tue, Nov 06, 2018 at 08:58:03PM -0800, Piotr Sikora wrote:
> Reviving this thread now that we have HTTP Core, with semantics
> separated from HTTP/1.1 and HTTP/2 messaging.
> 
> As of right now, the header values at the semantics layer are limited
> to the visible characters:
> 
>     field-value = *( field-content / obs-fold )
>     field-content = field-vchar [ 1*( SP / HTAB ) field-vchar ]
>     field-vchar = VCHAR / obs-text
> 
> where:
> 
>     VCHAR = %x21-7E
>     obs-text = %x80-FF
> 
> I'm fine with the existing restrictions being enforced at the HTTP/1.1
> messaging layer, where binary values could be converted according to
> the "Byte Sequence" rules from Structured Headers, however both HTTP/2
> and HTTP/3 are perfectly capable of transmitting all octets, so the
> semantics layer shouldn't be limited by the fact that HTTP/1.x is a
> text-based protocol.
> 
> If this is published as-is, it's going to prevent use of binary values
> in header fields "for the rest of our careers" (according to the WG
> chairs), so I guess it's "now or never" kind of thing.
> 
> Thoughts?

While I can see the value in doing this for having had to deal with value
encoding, I also see some obvious problems with it like transcoding to
older HTTP versions. Despite this I think it deserves some thinking.

My primary concern really is about the risk that such header fields get
transcoded to H1 and cause huge damage, even where unexpected by the
persons deploying a gateway for instance. One idea against this could
be that we introduce a new header field (please don't beat me) to
indicate if a message may be downgraded and if so, till what version.
We could for example have "Requires: HTTP/2" in an H3 message to
indicate that the message cannot be conveyed over HTTP versions older
than 2. This could be useful over the long term to transport other
semantics that were possibly ambigiuous before certain versions. In
this case a message conveying binary data would be expected to pass
this "requires: h2" field to make sure a gateway doesn't pass it over
an older version. And in my opinion it's this signal that needs to be
defined early and before we generalize H{3,2,1} <-> H{3,2,1} gateways.

We could then decide that certain new header fields must be watched
and obeyed by gateways supporting certain versions.

We already have this issue with some protocol elements introduced in
H2 like the "never index" fields in HPACK. It's the reason we've had
to completely redesign the internal HTTP stack in haproxy, because for
now H2 messages are translated to HTTP/1.1 but it's not possible to
keep this type of information there if we need to re-encode to H2.

Another important element to keep in mind is the list delimiter. The
HTTP spec says that a header field may appear multiple times in a
message if and only if it's defined as a comma-delimited list (with
an exception for set-cookie which we all love). With your proposal
to allow all characters and to pass binary data, as soon as a header
field appears multiple times, there will definitely be agents which
will fold the values by appending a comma and a space and this will
break your contents. So we'd probably need to define how such header
fields should (not?) be folded and which ones it applies to.

Finally, agents are free to trim leading and trailing LWS in values.
Here again it will destroy your contents, so we also need to take care
of this.

For all these reasons I'm starting to suspect that we'll sooner or
later have to introduce a notion of properties associated with header
fields. One of them could be "binary", which implies no trimming, no
folding. Another one could be "do-not-fold" for set-cookie and possibly
others (i.e. all those supposed to contain a comma like Date or Expires).
We could imagine having a "binary: <name-list>" field passing the list
of binary header names, but it really is a pain to deal with parsers
relying on names found in other header fields.

All such properties could be defined in the core with their default
values (e.g. no-fold for "set-cookie"), and certain HTTP versions could
be able to override the default properties, for instance to specify that
a given field is of type binary and must not be mangled. And it's only
with the minimum required version signal that you can make sure that all
elements along the chain will respect the promise not to touch it.

Regards,
Willy

Received on Wednesday, 7 November 2018 05:54:23 UTC