Re: Header Compression - streaming proposal from Martin Thomson on 2013-07-05 (ietf-http-wg@w3.org from July to September 2013)

From: Martin Thomson <martin.thomson@gmail.com>
Date: Fri, 5 Jul 2013 10:38:47 -0700
To: Gábor Molnár <gabor.molnar@sch.bme.hu>
Cc: HTTP Working Group <ietf-http-wg@w3.org>
Message-ID: <CABkgnnVqWjWrGWuP+eZniGJe+WWL7Ekt+88wJ8xO9tkHqzhNfA@mail.gmail.com>
In terms of simplicity, doing toggles first, then literals (w/ or w/o
changes to the table), makes a lot of sense to me.

That shifts some complexity to the encoder.  An encoder will have to
run two passes over its headers.

It also makes routing decisions a little harder for intermediation,
since routing information (usually :path, but the other :-headers need
to be checked too) are no longer at the head of the line if we assume
that :path changes request-by-request.

I'm just pointing out the trade-off.  Those costs do seem manageable.

On 5 July 2013 01:51, Gábor Molnár <gabor.molnar@sch.bme.hu> wrote:
> An important detail was left out:
>   3.3. step: if the entry was inserted, set the reference flag to true on
> it.
>
>
> 2013/7/5 Gábor Molnár <gabor.molnar@sch.bme.hu>
>>
>> This a proposal for a seemingly minor change, that could make it possible
>> to implement
>> a streaming encoder/decoder for the compression spec, and make the
>> decoding process
>> simpler. It would also eliminate certain corner cases, like the shadowing
>> problem.
>>
>> There's a lot of talk recently on enforcing the memory usage limits of the
>> compression
>> spec. There's one component however, that we don't take into account when
>> computing
>> the memory usage of compression implementations: it's the Working Set. The
>> problem
>> is that it can grow without bounds, since as far as I know, HTTP does not
>> impose limits
>> on the size of the header set. I tried to come up with a decoder
>> implementation
>> architecture for the compression spec that would not have to store the
>> whole set in the
>> memory.
>>
>> Such a decoder would instead stream the output of the decoding process,
>> header by
>> header. This seems to be a legitimate approach, since most of the
>> memory-conscious
>> parsers I know are implemented as streaming parsers (streaming json, xml,
>> http, ... parsers). Gzip, the base of the previously used header
>> compression mechanism
>> is a streaming compressor/decompressor as well, of course.
>>
>> It turns out that it is not possible to implement the current spec as a
>> streaming parser.
>> The only reason is this: if an entry gets inserted into the working set,
>> it is not guaranteed
>> that it will remain there until the end of the decompression process,
>> since it could be
>> deleted any time. Because of this, it is not possible to emit any headers
>> until the end
>> of the process.
>>
>> I propose a simple change, that could, however, guarantee this: in header
>> blocks, Indexed
>> Representations should come first. This would guarantee that after the
>> Indexed
>> Representations are over, there will be no deletion from the Working Set.
>> This is the only
>> thing that would have to be changed. Existing decoding process can be
>> applied as if nothing
>> would change.
>>
>> But it is now possible to implement a streaming, and - as a side effect -
>> much simpler
>> decoder like this:
>>
>> 0. There's only one component: the Header Table. An entry in the Header
>> Table is a
>>     name-value pair with an index (just like before), and a 'reference'
>> flag that is not set by
>>     default.
>> 1. First phase of decoding: dealing with indexed representations. Indexed
>> representations
>>     simply flip the 'reference' flag on the entry they reference.
>> 2. Second phase of decoding: before starting the processing of literal
>> representations, emit
>>     every name-value pair that is flagged in the Header Table.
>> 3. Third phase of decoding: for every literal representations:
>>   1. emit the name-value pair
>>   2. insert it in the table if needed (incremental or substitution
>> indexing with table size
>>       enforcement)
>> 4. When a new header block arrives, jump to 1.
>>
>> It is maybe not obvious at first, but this process is equivalent the the
>> current decoding process,
>> if indexed representations come first. Please point out corner cases if
>> you find any.
>>
>> I think that the 'Indexed Representations come first' pattern is something
>> that comes naturally
>> when implementing an encoder. Even examples in the spec can remain
>> unchanged, since they
>> follow this pattern already.
>>
>> Regards,
>>   Gábor
>
>
Received on Friday, 5 July 2013 17:39:14 UTC