- From: Gábor Molnár <gabor.molnar@sch.bme.hu>
- Date: Mon, 08 Jul 2013 10:54:41 +0200
- To: Martin Thomson <martin.thomson@gmail.com>
- Cc: HTTP Working Group <ietf-http-wg@w3.org>
- Message-id: <CA+KJw_6W_J=cLP3AsVQK3GV-PuvM2MsDW7BVUeH=7=s_rk4gGw@mail.gmail.com>
2013/7/5 Martin Thomson <martin.thomson@gmail.com> > In terms of simplicity, doing toggles first, then literals (w/ or w/o > changes to the table), makes a lot of sense to me. > > That shifts some complexity to the encoder. An encoder will have to > run two passes over its headers. > I've been thinking about this for a while: is it worth moving complexity from the decoder to the encoder, when a resource constrained client have to implement both (for request/response)? I think the answer is yes, because 1. it *remains possible* to implement encoding with minimal resource usage: just use literal representations without indexing for all headers. 2. it *becomes possible* to implement decoding with minimal resource usage (streaming decoder). > It also makes routing decisions a little harder for intermediation, > since routing information (usually :path, but the other :-headers need > to be checked too) are no longer at the head of the line if we assume > that :path changes request-by-request. > I don't think it becomes harder. If an output buffer is used, and at the end the headers are reordered, then we get back to the situation we have now. When implementing a routing logic that can handle streaming, then the routing decision can be made somewhat sooner than the end of the decompressing process. I agree that implementing such a streaming capable routing logic might be more difficult and might not be worth it (especially given that proxies are usually not memory-constrained). > I'm just pointing out the trade-off. Those costs do seem manageable. > Thanks for the feedback! > On 5 July 2013 01:51, Gábor Molnár <gabor.molnar@sch.bme.hu> wrote: > > An important detail was left out: > > 3.3. step: if the entry was inserted, set the reference flag to true on > > it. > > > > > > 2013/7/5 Gábor Molnár <gabor.molnar@sch.bme.hu> > >> > >> This a proposal for a seemingly minor change, that could make it > possible > >> to implement > >> a streaming encoder/decoder for the compression spec, and make the > >> decoding process > >> simpler. It would also eliminate certain corner cases, like the > shadowing > >> problem. > >> > >> There's a lot of talk recently on enforcing the memory usage limits of > the > >> compression > >> spec. There's one component however, that we don't take into account > when > >> computing > >> the memory usage of compression implementations: it's the Working Set. > The > >> problem > >> is that it can grow without bounds, since as far as I know, HTTP does > not > >> impose limits > >> on the size of the header set. I tried to come up with a decoder > >> implementation > >> architecture for the compression spec that would not have to store the > >> whole set in the > >> memory. > >> > >> Such a decoder would instead stream the output of the decoding process, > >> header by > >> header. This seems to be a legitimate approach, since most of the > >> memory-conscious > >> parsers I know are implemented as streaming parsers (streaming json, > xml, > >> http, ... parsers). Gzip, the base of the previously used header > >> compression mechanism > >> is a streaming compressor/decompressor as well, of course. > >> > >> It turns out that it is not possible to implement the current spec as a > >> streaming parser. > >> The only reason is this: if an entry gets inserted into the working set, > >> it is not guaranteed > >> that it will remain there until the end of the decompression process, > >> since it could be > >> deleted any time. Because of this, it is not possible to emit any > headers > >> until the end > >> of the process. > >> > >> I propose a simple change, that could, however, guarantee this: in > header > >> blocks, Indexed > >> Representations should come first. This would guarantee that after the > >> Indexed > >> Representations are over, there will be no deletion from the Working > Set. > >> This is the only > >> thing that would have to be changed. Existing decoding process can be > >> applied as if nothing > >> would change. > >> > >> But it is now possible to implement a streaming, and - as a side effect > - > >> much simpler > >> decoder like this: > >> > >> 0. There's only one component: the Header Table. An entry in the Header > >> Table is a > >> name-value pair with an index (just like before), and a 'reference' > >> flag that is not set by > >> default. > >> 1. First phase of decoding: dealing with indexed representations. > Indexed > >> representations > >> simply flip the 'reference' flag on the entry they reference. > >> 2. Second phase of decoding: before starting the processing of literal > >> representations, emit > >> every name-value pair that is flagged in the Header Table. > >> 3. Third phase of decoding: for every literal representations: > >> 1. emit the name-value pair > >> 2. insert it in the table if needed (incremental or substitution > >> indexing with table size > >> enforcement) > >> 4. When a new header block arrives, jump to 1. > >> > >> It is maybe not obvious at first, but this process is equivalent the the > >> current decoding process, > >> if indexed representations come first. Please point out corner cases if > >> you find any. > >> > >> I think that the 'Indexed Representations come first' pattern is > something > >> that comes naturally > >> when implementing an encoder. Even examples in the spec can remain > >> unchanged, since they > >> follow this pattern already. > >> > >> Regards, > >> Gábor > > > > >
Received on Monday, 8 July 2013 08:55:33 UTC