Re: Understanding how HPAC draft-02 works from Roberto Peon on 2013-08-24 (ietf-http-wg@w3.org from July to September 2013)

From: Roberto Peon <grmocg@gmail.com>
Date: Sat, 24 Aug 2013 03:06:04 -0700
To: Tatsuhiro Tsujikawa <tatsuhiro.t@gmail.com>
Cc: "ietf-http-wg@w3.org" <ietf-http-wg@w3.org>
Message-ID: <CAP+FsNeiwOi8c6_uRLQF_HDo4ROsY13qQ+DMjNeHw5aMTz3arA@mail.gmail.com>
Correct, which is why emitting references first is an easy optimization :)

You should take a look at the pseudo-code here:
http://tools.ietf.org/html/draft-rpeon-httpbis-header-compression-03#section-10
This is for the delta2 encoding scheme, which is a little different, but
the approach to dealing with this issue is in there, and is not so bad.

The question of on-the-wire-size is a fun one. I believe that what is
currently in the draft will result in smaller on-the-wire sized stuff
because most of the time the things you'd reference should not be expired
from the state (given the analysis of distance-to-referenced-index I posted
some time back).

I suspect the trickiest bit will be dealing with any required-ordering for
certain headers which might require it (and which will require doing fun
things with references).
-=R


On Sat, Aug 24, 2013 at 12:07 AM, Tatsuhiro Tsujikawa <tatsuhiro.t@gmail.com
> wrote:

>
>
>
> On Sat, Aug 24, 2013 at 6:06 AM, Roberto Peon <grmocg@gmail.com> wrote:
>
>> Any removal from the state set requires that anything that pointed to it
>> be removed (else you'd segv or equivalent).
>> Thus, substitution or expiry always requires the corresponding
>> reference-set entry to be removed.
>>
>>
> Thank you for clarifying that. I submitted the issue for this in the
> github.
>
>
>> Your sentence: "But to
>> handle common header gracefully with eviction, when the entry in
>> the header table is removed from the header table due to the
>> eviction or substitution, if the entry is in the reference set
>> and it is not emitted in the current header processing, emit the
>> entry on the removal."
>>
>> is thus partially correct.
>>
>> The entry should be removed, but not emitted-- the draft currently
>> specifies emitting things only when:
>>
>>    - The entry is indexed, and is not present in the reference set
>>    - A new entry is added
>>    - The entry is in the reference set after all operations have been
>>    processed AND it hasn't been emitted.
>>
>>
>>
> The problem here is that the we have to track the common headers removal
> in anyway (either decoder or encoder).
>
> For example, if we have header table like this:
>
> #0 alpha, bravo
> #1 charlie, delta,
> #2 ...
> and so on
>
> And #0 is in the reference set.
>
> Now encoder starts encoding the following header set:
>
> alpha, bravo
> echo, foxtrot
>
> If the name/value pairs in header set is processed this order,
> alpha,bravo is in the reference set, so it is "common header" and nothing
> encoded. Next, encoder somehow decided to encode echo,foxtrot as literal
> and
> added to the header table but it turned out that removes alpha,bravo from
> the
> header table.
> As a result, the header block only includes echo,foxtrot as literal block.
> If decoder does not emit the alpha,bravo on the removal, it will only emit
> echo,foxtrot.
> But if emission on the removal is not the intention of the draft, we can
> do the
> similar thing in the encoder side. Instead of emitting common header on
> the removal on the decoder side, encode common header on removal on
> the encoder side (which brings back to the entry to the header table and
> reference set). The downside is the bytes on the wire will be potentially
> increased because we have to do literal for the value anyway. Also encoding
> of the common header will cause eviction of the another common header.
>
> So for the next interop testing, the which strategy is a way to go?
>
>
>> Much of the algorithm you define seems reasonable to me (there are a few
>> optimizations, but who cares right now? :) ).
>>
>>
> Yep, we all know the premature optimization cause what ;)
>
> Best regards,
> Tatsuhiro Tsujikawa
>
>
> Would you like to raise an issue so that we can track any confusion here?
>>
>> -=R
>>
>>
>>
>> On Fri, Aug 23, 2013 at 10:47 AM, Tatsuhiro Tsujikawa <
>> tatsuhiro.t@gmail.com> wrote:
>>
>>> I'm trying to figure out how the HPAC works.  HPAC says that it
>>> clarify the eviction and index shadowing, but I'm under the
>>> impression that HPAC is still not clear how the entry in the
>>> reference set is removed from the header table because of
>>> eviction or substitution. This is important because, due to the
>>> differential encoding, the encoder and decoder must agree with
>>> the "common" headers, which may be removed from the header table
>>> because of eviction or substitution.
>>>
>>> After several tries and error, I came up with the following
>>> encoder/decoder procedures, which I hopefully think that
>>> conforming to the HPAC draft (well, I may be completely wrong).
>>>
>>> Encoder
>>> -------
>>>
>>> 1. For each entry in the reference set, check that it is present
>>>    in the current header set. If it is not, encode it as indexed
>>>    representation and remove it from the reference set.
>>>
>>> 2. For each entry in the reference set, check that it is present
>>>    in the current header set. If it is present, mark the entry
>>>    as "common-header" and remove the matching name/value pair
>>>    from current header set (if multiple name/value pairs are
>>>    matched, only one of them is removed from the current header
>>>    set).
>>>
>>> 3. Encode the rest of name/value pair in current header set. For each
>>>    name/value pair:
>>>
>>> 3.1. If name/value pair is present in the header table, and the
>>>      corresponding entry in the header table is NOT in the
>>>      reference set, add the entry to the reference set and encode
>>>      it as indexed representation. Mark the entry "emitted".
>>>
>>> 3.2. If name/value pair is present in the header table, and the
>>>      corresponding entry in the header table is in the reference
>>>      set: If the entry is marked as "common-header", then this is
>>>      the 2nd occurrence of the same indexed representation. To
>>>      encode this name/value pair, we have to encode 4 indexed
>>>      representation. 2 for the 1st one (which was removed in step
>>>      2), and the another 2 for the current name/value pair.
>>>      Unmark the entry "common-header" and mark it "emitted".
>>>
>>>      If the entry is marked as "emitted", then this is also the
>>>      occurrences of the same indexed representation. But this time,
>>>      we just encode 2 indexed representation.
>>>
>>> 3.3. Otherwise, encoder encodes name/value pair as literal
>>>      representation.  On eviction or substitution, if the removed
>>>      entry is in the reference set, it is removed from the
>>>      reference set.
>>>
>>> 4. After all current header set is processed, unmark all entries in
>>>    the header table.
>>>
>>> Decoder
>>> -------
>>>
>>> Decoder generally just performs what the encoder emitted.  But to
>>> handle common header gracefully with eviction, when the entry in
>>> the header table is removed from the header table due to the
>>> eviction or substitution, if the entry is in the reference set
>>> and it is not emitted in the current header processing, emit the
>>> entry on the removal.
>>>
>>> --
>>>
>>> I implemented the above encoder/decoder procedure and it seems to
>>> work.  But I'm not sure it conforms to the current draft,
>>> especially for the Encoder step 2 and Decoder's header emission
>>> on the eviction because they are not described in the draft at
>>> all. There is certainly better, correct way to go, but currently
>>> I failed to see it. How do you read the draft?
>>>
>>> Best regards,
>>>
>>> Tatsuhiro Tsujikawa
>>>
>>>
>>
>
Received on Saturday, 24 August 2013 10:06:32 UTC