RE: HPACK problems (was http/2 & hpack protocol review)

Hi Cory,

Thanks for your questions and comments: they're helpful by pointing the parts of the spec that are not clear enough and will help us improve this spec.

See my answers below.

> -----Original Message-----
> From: Cory Benfield [mailto:cory@lukasa.co.uk]
> Sent: mardi 6 mai 2014 13:04
> To: Tatsuhiro Tsujikawa
> Cc: Daniel Stenberg; James M Snell; K.Morgan@iaea.org; ietf-http-wg@w3.org;
> C.Brunhuber@iaea.org
> Subject: Re: HPACK problems (was http/2 & hpack protocol review)
> 
> On 6 May 2014 11:52, Tatsuhiro Tsujikawa <tatsuhiro.t@gmail.com> wrote:
> > And now the problem is what is the expected behavior.
> > I think duplicates should be allowed because the algorithm in HPACK does not
> > break if we allow them.
> > I don't think the complexity of HPACK is much lowered if we get rid of
> > duplicates.
> 
> Agreed: considering the HPACK spec in the abstract the simplest and
> broadest change is to explicitly allow duplicates in the header set.
> We'd need to decide how that affects the reference set (currently that
> would add the same reference twice into the reference set, which is
> again probably acceptable).
> 
> However, considering the real-world for a moment: disallowing
> duplicates allows for hugely efficient operations to build the header
> set by using proper set data structures (present in every useful
> language's standard libraries and functions except C). In fact,
> logically in pseudocode decoding becomes:
> 
> - Initialize empty header set
> - Decode each header and add it to the header set and the reference
> set (or remove it, as instructed)
> - Emit the union of the header set and the reference set
> 
> These operations are fairly cheap and conceptually very clear.
> 
> Don't mistake that for me being hugely attached to the 'header set as
> actual set' notion. I'm quite happy to go with nghttp2 on this,
> especially as rewriting hyper is probably easier than rewriting
> nghttp2. Just thought I'd present the case for the alternative.

The intent is to allow duplicates in the header set. In an ideal world, it would be an 'actual set', but unfortunately, in my experimentation for building a header compression mechanism, I found several occurrences of real-world message headers containing duplicates. To support these use cases, HPACK has to allow for duplicates in the header set. I'm going to update the definition to make things clear.

On the other hand, the reference set is an 'actual set': it contains references to entries of the header table, and must not contain the same reference multiple times. However, it may contain two references resolving as the same header field, it this header field is contained in several entries of the header table.

The trick to encode a duplicate header field, is to encode it first as a literal, adding it to the header table and to the reference set, then to encode it twice as an index, the first index removing it from the reference set, and the second index adding it again to the reference set and to the encoded collection of headers.

Hervé.

Received on Wednesday, 7 May 2014 09:36:37 UTC