Re: HPACK problems (was http/2 & hpack protocol review) from Cory Benfield on 2014-05-08 (ietf-http-wg@w3.org from April to June 2014)

From: Cory Benfield <cory@lukasa.co.uk>
Date: Thu, 8 May 2014 16:24:57 +0100
To: RUELLAN Herve <Herve.Ruellan@crf.canon.fr>
Cc: Tatsuhiro Tsujikawa <tatsuhiro.t@gmail.com>, Daniel Stenberg <daniel@haxx.se>, James M Snell <jasnell@gmail.com>, "K.Morgan@iaea.org" <K.Morgan@iaea.org>, "ietf-http-wg@w3.org" <ietf-http-wg@w3.org>, "C.Brunhuber@iaea.org" <C.Brunhuber@iaea.org>
Message-ID: <CAH_hAJEfD5k-=dZao0rPsor340aiVZyEsraMOxySftr9LNf6rg@mail.gmail.com>

On 7 May 2014 10:35, RUELLAN Herve <Herve.Ruellan@crf.canon.fr> wrote:
> The intent is to allow duplicates in the header set. In an ideal world, it would
> be an 'actual set', but unfortunately, in my experimentation for building a header
> compression mechanism, I found several occurrences of real-world message
> headers containing duplicates. To support these use cases, HPACK has to
> allow for duplicates in the header set. I'm going to update the definition to
> make things clear.
>
> On the other hand, the reference set is an 'actual set': it contains references
> to entries of the header table, and must not contain the same reference multiple
> times. However, it may contain two references resolving as the same header
> field, it this header field is contained in several entries of the header table.

Fair enough: these didn't map to my expectations when reading the
spec, but they're obviously both totally reasonable ways to define
these terms. There are several moving pieces here and a fair bit of
subtlety, but I'm sure I can come up with something acceptable. I'm
looking forward to an updated spec so I can rewrite my entire
implementation again. =)

> The trick to encode a duplicate header field, is to encode it first as a literal,
> adding it to the header table and to the reference set, then to encode it twice
> as an index, the first index removing it from the reference set, and the second
> index adding it again to the reference set and to the encoded collection of
> headers.

This seems like exactly the kind of behaviour that leads to someone
suggesting a performance optimisation., and I'd love to bikeshed this
for a moment if you'd allow me. Is there any reason that HPACK
couldn't mandate that duplicate headers be forbidden in the same
header set? We have the ability to join the duplicates together into a
single header (with their values joined by null bytes), so it's in
principle possible for HPACK encoders to make this transformation.

The only argument I can see is that 'streaming' HPACK encoders (those
that don't have the full set of headers available to them) aren't able
to spot this optimisation up-front, and so can't make it (and
therefore can't comply). I don't really feel like we need to
accommodate such encoders, for two reasons. Firstly, I'm pretty sure
that no service generates so many headers that a HPACK encoder
couldn't deal with them in one go. Secondly, the ability to emit
headers straight away is almost totally unhelpful given that, even for
the largest of header sets, it's unlikely that HPACK encoding would
take more than a few tens of milliseconds, well under a RTT.

I accept that I'm wearing blinders here, because I deal with users who
always know what headers they're going to apply, so please tell me
what I'm missing.

Received on Thursday, 8 May 2014 15:25:25 UTC