Re: Header Compression - streaming proposal from Gábor Molnár on 2013-07-05 (ietf-http-wg@w3.org from July to September 2013)

From: Gábor Molnár <gabor.molnar@sch.bme.hu>
Date: Fri, 05 Jul 2013 10:51:32 +0200
To: HTTP Working Group <ietf-http-wg@w3.org>
Message-id: <CA+KJw_4zqU7jdZNs9NpfA3HbjAcnhRLgMKG0Apf_nzyK9VrkHg@mail.gmail.com>
An important detail was left out:
  3.3. step: if the entry was inserted, set the reference flag to true on
it.


2013/7/5 Gábor Molnár <gabor.molnar@sch.bme.hu>

> This a proposal for a seemingly minor change, that could make it possible
> to implement
> a streaming encoder/decoder for the compression spec, and make the
> decoding process
> simpler. It would also eliminate certain corner cases, like the shadowing
> problem.
>
> There's a lot of talk recently on enforcing the memory usage limits of the
> compression
> spec. There's one component however, that we don't take into account when
> computing
> the memory usage of compression implementations: it's the Working Set. The
> problem
> is that it can grow without bounds, since as far as I know, HTTP does not
> impose limits
> on the size of the header set. I tried to come up with a decoder
> implementation
> architecture for the compression spec that would not have to store the
> whole set in the
> memory.
>
> Such a decoder would instead stream the output of the decoding process,
> header by
> header. This seems to be a legitimate approach, since most of the
> memory-conscious
> parsers I know are implemented as streaming parsers (streaming json, xml,
> http, ... parsers). Gzip, the base of the previously used header
> compression mechanism
> is a streaming compressor/decompressor as well, of course.
>
> It turns out that it is not possible to implement the current spec as a
> streaming parser.
> The only reason is this: *if an entry gets inserted into the working set,
> it is not guaranteed
> that it will remain there until the end of the decompression process,
> since it could be
> deleted any time*. Because of this, it is not possible to emit any
> headers until the end
> of the process.
>
> I propose a simple change, that could, however, guarantee this: *in
> header blocks, Indexed
> Representations should come first*. This would guarantee that after the
> Indexed
> Representations are over, there will be no deletion from the Working Set.
> This is the only
> thing that would have to be changed. Existing decoding process can be
> applied as if nothing
> would change.
>
> But it is now possible to implement a streaming, and - as a side effect -
> much simpler
> decoder like this:
>
> 0. There's only one component: the Header Table. An entry in the Header
> Table is a
>     name-value pair with an index (just like before), and a 'reference'
> flag that is not set by
>     default.
> 1. First phase of decoding: dealing with indexed representations. Indexed
> representations
>     simply flip the 'reference' flag on the entry they reference.
> 2. Second phase of decoding: before starting the processing of literal
> representations, emit
>     every name-value pair that is flagged in the Header Table.
> 3. Third phase of decoding: for every literal representations:
>   1. emit the name-value pair
>   2. insert it in the table if needed (incremental or substitution
> indexing with table size
>       enforcement)
> 4. When a new header block arrives, jump to 1.
>
> It is maybe not obvious at first, but this process is equivalent the the
> current decoding process,
> if indexed representations come first. Please point out corner cases if
> you find any.
>
> I think that the 'Indexed Representations come first' pattern is something
> that comes naturally
> when implementing an encoder. Even examples in the spec can remain
> unchanged, since they
> follow this pattern already.
>
> Regards,
>   Gábor
>
Received on Friday, 5 July 2013 08:52:24 UTC