Re: Header Serialization Discussion

On Mon, Apr 15, 2013 at 8:28 AM, RUELLAN Herve
<Herve.Ruellan@crf.canon.fr> wrote:
>
>[snip]
>>
>>   - The true utility of the common prefix length mechanism is questionable.
>> Aside from the potential security risks, I questioning just how effective it's
>> going to be in practice. (What header fields do we expect to actually use it in
>> practice?)
>
> Common prefixes are very efficient for URLs: the paths often share some common part at their beginnings. They are also useful for other type of data such a date and integers, but these could be optimized using typed codecs.
>

I generally prefer the typed codecs for dates and integers. I'm
struggling to see what, beyond URLs, the prefixes will be useful for,
really. I mean, I get the theory, I understand their use, but I'm just
not convinced how often it will be practical outside of the request
URI. I don't want to incur the performance hit of calculating the
longest common prefix for every text header if doing so just isn't
going to be useful. What I'm considering is writing my encoder so that
it only calculates the common prefix for the :path header, ignoring it
for everything else.

>>   - The fact that items are never removed from the Name Table is concerning
>> and poses a potential security risk. The the decompressor is forced to
>> maintain a symbol table for every header name it encounters, a malicious
>> compressor could cause overruns by sending a high number of junk header
>> names. The compressor ought to be able to treat the Name Table as an LRU,
>> and ought to be able to place strict limits on it's size, just as it does with the
>> Header Table. Delta does not have this issue because it's name indices are
>> already tied to the LRU.
>
> True, some kind of mechanism should be added to prevent memory overrun of the Name Table. A simple one is to simply limit the number of entries that can be added to the table.
>
>> With HeaderDiff in mind, I'm thinking about how we can bring HeaderDiff and
>> Delta together and roll in the additional concepts of Typed Codecs. Here's
>> what I have in mind.
>>
> [snip]
>
> My main concern with your proposal is about doing the LRU on both the encoder and the decoder.
> I'd really like to keep the decoder as simple as possible, and so keeping all the buffer management on the encoder side is a real win for me.
> In addition, the asymmetry means that the encoder is free to do whatever buffer management it wants. LRU is a very good default buffer management scheme, however I think there are cases where some clever scheme could beat it.

Well, it's not so much an LRU cache as a "least recently written"
queue. The buffer essentially consists of 128 memory slots. These are
assigned in order and rotate, with used slots deallocated and
reassigned as the buffer fills past it's limit. The encoder, then,
needs to be selective about just what it decides to assign to the
buffer. So long as an implementation follows the proper assignment
order, the specific implementation does not matter.

>
>> For ISO-8859-1 Text, the Static Huffman Code used by Delta would be used
>> for the value. If we can develop an approach to effectively handling Huffman
>> coding for arbitrary UTF-8, then we can apply Huffman coding to that as well.
>>
>> For the Number and DateTime serialization:
>>
>>   - 16-bit numbers serialize with a strict maximum of 3 bytes
>>   - 32-bit numbers serialize with a strict maximum of 5 bytes.
>>   - 64-bit numbers serialize with a strict maximum of 10 bytes.
>>   - Date Times will serialize with five bytes for the reasonably relevant future,
>> then six bytes for quite some time after that. Dates prior to the epoch cannot
>> be represented.
>>
>> In order to properly deal with the backwards compatibility concerns for
>> HTTP/1, there are several important rules for use of Typed Codecs in HTTP
>> headers:
>>
>> 1. Headers must be explicitly defined to use the new header types. All
>> existing HTTP/1 headers, then, will continue to be required to be
>> represented as ISO-8859-1 Text unless their standard definitions are
>> updated. The HTTP/2 specification would update the definition of specific
>> known headers (e.g. content-length, date, if-modified-since, etc).
>>
>> 2. Extension headers that use the typed codecs will have specific normative
>> transformations to ISO-8859-1 defined.
>>     a. UTF-8 Text will be converted to ISO-8859-1 with extended characters
>> pct-encoded
>>     b. Numbers will be converted to their ASCII equivalent values.
>>     c. Date Times will be converted to their HTTP-Date equivalent values.
>>     d. Binary fields will be Base64-encoded.
>>
>> 3. There will be no normative transformation from ISO-8859-1 values into the
>> typed codecs. Implementations are free to apply transformation where
>> those impls determine it is appropriate, but it will be perfectly legal for an
>> implementation to pass a text value through even if it is known that a given
>> header type has a typed codec equivalent (for instance, Content-Length may
>> come through as a number or a text value, either will be valid). This means
>> that when translating from HTTP/1 -> HTTP/2, receiving implementations
>> need to be prepared to handle either value form.
>
> I really like your approach to typed codecs. And while it has been ruled out for the first implementation draft, I think it's useful to pursue as an improvement for a further version of the header compression mechanism.
> I'm willing to do experimentation on it.

Excellent. Once we have the basic header encoding down in the spec I
plan on writing up a more formal I-D proposal with spec language for
the typed codecs. However, I think that this particular aspect is
going to remain fairly stable now.

- James

>
> Hervé.
>
>> This approach combines what I feel are the best concepts from HeaderDiff
>> and Delta and provides simple, straightforward header serialization and state
>> management. Obviously, lots of testing is required, however.
>>
>> As always, thoughts/opinions/gripes are appreciated.
>>
>> - James
>

Received on Monday, 15 April 2013 15:55:48 UTC