Re: Header compression question: duplicate header entry and current index on computing working set from Tatsuhiro Tsujikawa on 2013-07-18 (ietf-http-wg@w3.org from July to September 2013)

From: Tatsuhiro Tsujikawa <tatsuhiro.t@gmail.com>
Date: Fri, 19 Jul 2013 00:48:34 +0900
To: Roberto Peon <grmocg@gmail.com>
Cc: Jeff Pinner <jpinner@twitter.com>, Mike Bishop <Michael.Bishop@microsoft.com>, Martin Thomson <martin.thomson@gmail.com>, "ietf-http-wg@w3.org" <ietf-http-wg@w3.org>
Message-ID: <CAPyZ6=LoOgACf39ziUXAT6yMpL-1hcLUsSnoG3ra58j8e7qu4Q@mail.gmail.com>

>From the responses, I'm under the impression that reference set is the set
of reference to the entry in the header table, and not the set of
name/value pairs.
I am now convinced that multiple duplicate entries are no problem with this
method, well as long as the encoder and decoder uses this method.

Since the spec seems to infer that reference set is just a pair of name and
value (Appendix B shows only name/value pairs, not index), one may think
that the reference set is name/value pair set and do the sweep header table
to get index. The node-http2 implements this way.
If the encoder uses the method described above and decoder uses sweep, then
they may be out of sync if there is duplicate in the header table.
Personally I much prefer the first method since it avoids sweeps and string
matching.
But this problem only occurs when the encoder throws duplicate entry to the
header table and since it is buggy and even considered as "bug", so as long
as the encoder is good enough not to do this, there are no problem.

Best regards,

Tatsuhiro Tsujikawa



On Thu, Jul 18, 2013 at 4:39 AM, Roberto Peon <grmocg@gmail.com> wrote:

> Not necessarily-- If the encoder/compressor says that element 5 is k,v,
> and then it also appends k,v as element 7, for example.
>
> When this happens things are less efficient, yes, but that is it.
>
> There has been little optimization around duplicate k,v entries because
> 1) It is very uncommon
> 2) It doesn't break anything
> 3) There doesn't seem to be a good reason to encourage the behavior by
> optimizing for it.
>  -=R
>
>
> On Wed, Jul 17, 2013 at 10:08 AM, Jeff Pinner <jpinner@twitter.com> wrote:
>
>> Doesn't the decompressor have to sweep the table to create the new
>> reference set and compare the (index, name, value) entries?
>>
>>
>> On Wed, Jul 17, 2013 at 9:59 AM, Mike Bishop <
>> Michael.Bishop@microsoft.com> wrote:
>>
>>>  How did the object get into the reference set?  Because the compressor
>>> referenced an object by index, or included it as a literal and added it to
>>> the table.****
>>>
>>> ** **
>>>
>>> So the object in the reference set points to the entry in the table it
>>> was added with.  If there happens to be another identical entry in the
>>> table, nothing says that the decompressor will even notice that.  I don’t
>>> recall anything that requires the decompressor to sweep the header table
>>> looking for matches – that’s the compressor’s job.****
>>>
>>> ** **
>>>
>>> *From:* Tatsuhiro Tsujikawa [mailto:tatsuhiro.t@gmail.com]
>>> *Sent:* Wednesday, July 17, 2013 9:52 AM
>>> *To:* Martin Thomson
>>> *Cc:* ietf-http-wg@w3.org
>>> *Subject:* Re: Header compression question: duplicate header entry and
>>> current index on computing working set****
>>>
>>> ** **
>>>
>>> ** **
>>>
>>> ** **
>>>
>>> On Thu, Jul 18, 2013 at 1:36 AM, Martin Thomson <
>>> martin.thomson@gmail.com> wrote:****
>>>
>>>  On 17 July 2013 08:56, Tatsuhiro Tsujikawa <tatsuhiro.t@gmail.com>
>>> wrote:
>>> > In 3.4, to compute working set from reference set of headers, the
>>> index of
>>> > entry in header table is required.
>>> > The question is, when the duplicate entries are in the header table,
>>> which
>>> > index is used as the index of working set?****
>>>
>>> If, for some strange reason, a compressor created multiple identical
>>> entries in the table, the decompressor is required to respect that
>>> choice, even if it is likely to be a bug.  This prevents the
>>> decompressor and compressor from getting out of sync.
>>>
>>> The compressor can then reference any of the entries when using an index.
>>> ****
>>>
>>>  ** **
>>>
>>> If the choice is arbitrary, then the compressor and decompressor may
>>> choose different index and****
>>>
>>> can get out of sync.****
>>>
>>> ** **
>>>
>>> For example,****
>>>
>>> Current header table:****
>>>
>>> |0|name1|value1|****
>>>
>>> |1|name1|value1|****
>>>
>>> ** **
>>>
>>> If name1/value1 is in reference set, compressor chooses index 0, and
>>> decompressor chooses index 1.****
>>>
>>> compressor wants to remove name1/value1, so reference index 0.****
>>>
>>> In decompressor side, however, seeing index header representation with
>>> index 0 and it is not in its reference set****
>>>
>>> (because name1/value1 is index 1), retrieve index 0 from header table
>>> and add it to working set.****
>>>
>>> Maybe I misunderstand the draft.****
>>>
>>> ** **
>>>
>>> If multiple identical entries are considered as a bug, then it would be
>>> better to****
>>>
>>> prohibit it in the spec and we are happy to not to consider these things.
>>> ****
>>>
>>> ** **
>>>
>>> Best regards,****
>>>
>>> ** **
>>>
>>> Tatsuhiro Tsujikawa****
>>>
>>> ** **
>>>
>>> ** **
>>>
>>
>>
>

Received on Thursday, 18 July 2013 15:49:22 UTC