Re: Understanding how HPAC draft-02 works from Tatsuhiro Tsujikawa on 2013-08-24 (ietf-http-wg@w3.org from July to September 2013)

From: Tatsuhiro Tsujikawa <tatsuhiro.t@gmail.com>
Date: Sat, 24 Aug 2013 22:56:21 +0900
To: Roberto Peon <grmocg@gmail.com>
Cc: "ietf-http-wg@w3.org" <ietf-http-wg@w3.org>
Message-ID: <CAPyZ6=K+QOzu4FtwLXhC4PcLcAXyesrtqgibFSTxRgdC-ny+tQ@mail.gmail.com>
On Sat, Aug 24, 2013 at 7:06 PM, Roberto Peon <grmocg@gmail.com> wrote:

> Correct, which is why emitting references first is an easy optimization :)
>
> You should take a look at the pseudo-code here:
> http://tools.ietf.org/html/draft-rpeon-httpbis-header-compression-03#section-10
> This is for the delta2 encoding scheme, which is a little different, but
> the approach to dealing with this issue is in there, and is not so bad.
>
>
Thank you for the pointer. It is very insightful. So the encoder should do
a bit more processing to make the
decoder simple.

For the record, here is the revised algorithm, which does not require
header emissions on eviction in decoding, so it is conforming to the draft.
The key point is that, when encoding, if "common-header" is removed from
the header table, keep track of it and later encode it just like the other
name/value pairs.

0. keep_set is initialized as empty set.

1. For each entry in the reference set, check that it is present
   in the current header set. If it is not, encode it as indexed
   representation and remove it from the reference set.

2. For each entry in the reference set, check that it is present
   in the current header set. If it is present, mark the entry
   as "common-header" and remove the matching name/value pair
   from current header set (if multiple name/value pairs are
   matched, only one of them is removed from the current header
   set).

3. Encode the rest of name/value pair in current header set. For each
   name/value pair:

3.1. If name/value pair is present in the header table, and the
     corresponding entry in the header table is NOT in the
     reference set, add the entry to the reference set and encode
     it as indexed representation. Mark the entry "emitted".

3.2. If name/value pair is present in the header table, and the
     corresponding entry in the header table is in the reference
     set: If the entry is marked as "common-header", then this is
     the 2nd occurrence of the same indexed representation. To
     encode this name/value pair, we have to encode 4 indexed
     representation. 2 for the 1st one (which was removed in step
     2), and the another 2 for the current name/value pair.
     Unmark the entry "common-header" and mark it "emitted".

     If the entry is marked as "emitted", then this is also the
     occurrences of the same indexed representation. But this time,
     we just encode 2 indexed representation.

3.3. Otherwise, encoder encodes name/value pair as literal
     representation.  On eviction or substitution, if the removed
     entry is in the reference set, it is removed from the
     reference set. If the removed entry is marked
     as "common-header", add it to keep_set.

4. For each name/value of entry in keep_set, do the same
   processing described in 3.1 thourgh 3.3. keep_set may be
   updated in the iteration.

5. After all current header set is processed, unmark all entries in
   the header table.




> The question of on-the-wire-size is a fun one. I believe that what is
> currently in the draft will result in smaller on-the-wire sized stuff
> because most of the time the things you'd reference should not be expired
> from the state (given the analysis of distance-to-referenced-index I posted
> some time back).
>
> I suspect the trickiest bit will be dealing with any required-ordering for
> certain headers which might require it (and which will require doing fun
> things with references).
> -=R
>
>
Just quick check of the number of "keep_set", it is mostly 0 (34 out of
45). So yeah, no problem at all.

I think header field ordering is not considered at all in the current
header compression.
I don't know much about such headers, but given that there is no apparent
concerns about it now,
maybe we can ignore it for now?

Best regards,

Tatsuhiro Tsujikawa


>
> On Sat, Aug 24, 2013 at 12:07 AM, Tatsuhiro Tsujikawa <
> tatsuhiro.t@gmail.com> wrote:
>
>>
>>
>>
>> On Sat, Aug 24, 2013 at 6:06 AM, Roberto Peon <grmocg@gmail.com> wrote:
>>
>>> Any removal from the state set requires that anything that pointed to it
>>> be removed (else you'd segv or equivalent).
>>> Thus, substitution or expiry always requires the corresponding
>>> reference-set entry to be removed.
>>>
>>>
>> Thank you for clarifying that. I submitted the issue for this in the
>> github.
>>
>>
>>> Your sentence: "But to
>>> handle common header gracefully with eviction, when the entry in
>>> the header table is removed from the header table due to the
>>> eviction or substitution, if the entry is in the reference set
>>> and it is not emitted in the current header processing, emit the
>>> entry on the removal."
>>>
>>> is thus partially correct.
>>>
>>> The entry should be removed, but not emitted-- the draft currently
>>> specifies emitting things only when:
>>>
>>>    - The entry is indexed, and is not present in the reference set
>>>    - A new entry is added
>>>    - The entry is in the reference set after all operations have been
>>>    processed AND it hasn't been emitted.
>>>
>>>
>>>
>> The problem here is that the we have to track the common headers removal
>> in anyway (either decoder or encoder).
>>
>> For example, if we have header table like this:
>>
>> #0 alpha, bravo
>> #1 charlie, delta,
>> #2 ...
>> and so on
>>
>> And #0 is in the reference set.
>>
>> Now encoder starts encoding the following header set:
>>
>> alpha, bravo
>> echo, foxtrot
>>
>> If the name/value pairs in header set is processed this order,
>> alpha,bravo is in the reference set, so it is "common header" and nothing
>> encoded. Next, encoder somehow decided to encode echo,foxtrot as literal
>> and
>> added to the header table but it turned out that removes alpha,bravo from
>> the
>> header table.
>> As a result, the header block only includes echo,foxtrot as literal block.
>> If decoder does not emit the alpha,bravo on the removal, it will only emit
>> echo,foxtrot.
>> But if emission on the removal is not the intention of the draft, we can
>> do the
>> similar thing in the encoder side. Instead of emitting common header on
>> the removal on the decoder side, encode common header on removal on
>> the encoder side (which brings back to the entry to the header table and
>> reference set). The downside is the bytes on the wire will be potentially
>> increased because we have to do literal for the value anyway. Also
>> encoding
>> of the common header will cause eviction of the another common header.
>>
>> So for the next interop testing, the which strategy is a way to go?
>>
>>
>>> Much of the algorithm you define seems reasonable to me (there are a few
>>> optimizations, but who cares right now? :) ).
>>>
>>>
>> Yep, we all know the premature optimization cause what ;)
>>
>> Best regards,
>> Tatsuhiro Tsujikawa
>>
>>
>> Would you like to raise an issue so that we can track any confusion here?
>>>
>>> -=R
>>>
>>>
>>>
>>> On Fri, Aug 23, 2013 at 10:47 AM, Tatsuhiro Tsujikawa <
>>> tatsuhiro.t@gmail.com> wrote:
>>>
>>>> I'm trying to figure out how the HPAC works.  HPAC says that it
>>>> clarify the eviction and index shadowing, but I'm under the
>>>> impression that HPAC is still not clear how the entry in the
>>>> reference set is removed from the header table because of
>>>> eviction or substitution. This is important because, due to the
>>>> differential encoding, the encoder and decoder must agree with
>>>> the "common" headers, which may be removed from the header table
>>>> because of eviction or substitution.
>>>>
>>>> After several tries and error, I came up with the following
>>>> encoder/decoder procedures, which I hopefully think that
>>>> conforming to the HPAC draft (well, I may be completely wrong).
>>>>
>>>> Encoder
>>>> -------
>>>>
>>>> 1. For each entry in the reference set, check that it is present
>>>>    in the current header set. If it is not, encode it as indexed
>>>>    representation and remove it from the reference set.
>>>>
>>>> 2. For each entry in the reference set, check that it is present
>>>>    in the current header set. If it is present, mark the entry
>>>>    as "common-header" and remove the matching name/value pair
>>>>    from current header set (if multiple name/value pairs are
>>>>    matched, only one of them is removed from the current header
>>>>    set).
>>>>
>>>> 3. Encode the rest of name/value pair in current header set. For each
>>>>    name/value pair:
>>>>
>>>> 3.1. If name/value pair is present in the header table, and the
>>>>      corresponding entry in the header table is NOT in the
>>>>      reference set, add the entry to the reference set and encode
>>>>      it as indexed representation. Mark the entry "emitted".
>>>>
>>>> 3.2. If name/value pair is present in the header table, and the
>>>>      corresponding entry in the header table is in the reference
>>>>      set: If the entry is marked as "common-header", then this is
>>>>      the 2nd occurrence of the same indexed representation. To
>>>>      encode this name/value pair, we have to encode 4 indexed
>>>>      representation. 2 for the 1st one (which was removed in step
>>>>      2), and the another 2 for the current name/value pair.
>>>>      Unmark the entry "common-header" and mark it "emitted".
>>>>
>>>>      If the entry is marked as "emitted", then this is also the
>>>>      occurrences of the same indexed representation. But this time,
>>>>      we just encode 2 indexed representation.
>>>>
>>>> 3.3. Otherwise, encoder encodes name/value pair as literal
>>>>      representation.  On eviction or substitution, if the removed
>>>>      entry is in the reference set, it is removed from the
>>>>      reference set.
>>>>
>>>> 4. After all current header set is processed, unmark all entries in
>>>>    the header table.
>>>>
>>>> Decoder
>>>> -------
>>>>
>>>> Decoder generally just performs what the encoder emitted.  But to
>>>> handle common header gracefully with eviction, when the entry in
>>>> the header table is removed from the header table due to the
>>>> eviction or substitution, if the entry is in the reference set
>>>> and it is not emitted in the current header processing, emit the
>>>> entry on the removal.
>>>>
>>>> --
>>>>
>>>> I implemented the above encoder/decoder procedure and it seems to
>>>> work.  But I'm not sure it conforms to the current draft,
>>>> especially for the Encoder step 2 and Decoder's header emission
>>>> on the eviction because they are not described in the draft at
>>>> all. There is certainly better, correct way to go, but currently
>>>> I failed to see it. How do you read the draft?
>>>>
>>>> Best regards,
>>>>
>>>> Tatsuhiro Tsujikawa
>>>>
>>>>
>>>
>>
>
Received on Saturday, 24 August 2013 13:57:09 UTC