- From: Gábor Molnár <gabor.molnar@sch.bme.hu>
- Date: Fri, 05 Jul 2013 04:42:44 +0200
- To: HTTP Working Group <ietf-http-wg@w3.org>
- Message-id: <CA+KJw_5xfvnCYM7QmtLQebPDO-fJbZz6D47mjHEWui3=fiHUoQ@mail.gmail.com>
This a proposal for a seemingly minor change, that could make it possible to implement a streaming encoder/decoder for the compression spec, and make the decoding process simpler. It would also eliminate certain corner cases, like the shadowing problem. There's a lot of talk recently on enforcing the memory usage limits of the compression spec. There's one component however, that we don't take into account when computing the memory usage of compression implementations: it's the Working Set. The problem is that it can grow without bounds, since as far as I know, HTTP does not impose limits on the size of the header set. I tried to come up with a decoder implementation architecture for the compression spec that would not have to store the whole set in the memory. Such a decoder would instead stream the output of the decoding process, header by header. This seems to be a legitimate approach, since most of the memory-conscious parsers I know are implemented as streaming parsers (streaming json, xml, http, ... parsers). Gzip, the base of the previously used header compression mechanism is a streaming compressor/decompressor as well, of course. It turns out that it is not possible to implement the current spec as a streaming parser. The only reason is this: *if an entry gets inserted into the working set, it is not guaranteed that it will remain there until the end of the decompression process, since it could be deleted any time*. Because of this, it is not possible to emit any headers until the end of the process. I propose a simple change, that could, however, guarantee this: *in header blocks, Indexed Representations should come first*. This would guarantee that after the Indexed Representations are over, there will be no deletion from the Working Set. This is the only thing that would have to be changed. Existing decoding process can be applied as if nothing would change. But it is now possible to implement a streaming, and - as a side effect - much simpler decoder like this: 0. There's only one component: the Header Table. An entry in the Header Table is a name-value pair with an index (just like before), and a 'reference' flag that is not set by default. 1. First phase of decoding: dealing with indexed representations. Indexed representations simply flip the 'reference' flag on the entry they reference. 2. Second phase of decoding: before starting the processing of literal representations, emit every name-value pair that is flagged in the Header Table. 3. Third phase of decoding: for every literal representations: 1. emit the name-value pair 2. insert it in the table if needed (incremental or substitution indexing with table size enforcement) 4. When a new header block arrives, jump to 1. It is maybe not obvious at first, but this process is equivalent the the current decoding process, if indexed representations come first. Please point out corner cases if you find any. I think that the 'Indexed Representations come first' pattern is something that comes naturally when implementing an encoder. Even examples in the spec can remain unchanged, since they follow this pattern already. Regards, Gábor
Received on Friday, 5 July 2013 02:43:36 UTC