RE: Header Serialization Discussion from RUELLAN Herve on 2013-04-15 (ietf-http-wg@w3.org from April to June 2013)

From: RUELLAN Herve <Herve.Ruellan@crf.canon.fr>
Date: Mon, 15 Apr 2013 15:28:36 +0000
To: James M Snell <jasnell@gmail.com>, "ietf-http-wg@w3.org" <ietf-http-wg@w3.org>
Message-ID: <6C71876BDCCD01488E70A2399529D5E51640F0D1@ADELE.crf.canon.fr>


> -----Original Message-----
> From: James M Snell [mailto:jasnell@gmail.com]
> Sent: dimanche 14 avril 2013 00:01
> To: ietf-http-wg@w3.org
> Subject: Header Serialization Discussion
> 
> Ok... so I've implemented HeaderDiff serialization [1] in java [2]...
> just the serialization so far, will do deserialization early next week. The
> implementation is very rough and definitely needs improvement /
> optimization but it's functional enough to start from.
> 
> [1] http://tools.ietf.org/html/draft-ruellan-headerdiff-00

> [2]
> https://github.com/jasnell/http2/tree/master/src/snell/http2/headers/hea

> derdiff
> 
> I know that Roberto is working on refinements to his own concept of the
> delta header serialization and hopefully he'll be sharing those thoughts soon,
> but I wanted to get my current thoughts captured for discussion.
> 
> The HeaderDiff approach is straightforward and effective and makes efficient
> use of available bits. It's variable length integer syntax is a bit complicated
> with the notion of bit-prefix lengths but once you've got it it's easy to
> understand what's going on. The implementation is a bit less complicated
> than delta, which is good and I like that there's no "Header Group" notion.
> Headers are either on or off per request. There are several concerns,
> however.
> 
>   - The true utility of the common prefix length mechanism is questionable.
> Aside from the potential security risks, I questioning just how effective it's
> going to be in practice. (What header fields do we expect to actually use it in
> practice?)

Common prefixes are very efficient for URLs: the paths often share some common part at their beginnings. They are also useful for other type of data such a date and integers, but these could be optimized using typed codecs.

>   - The fact that items are never removed from the Name Table is concerning
> and poses a potential security risk. The the decompressor is forced to
> maintain a symbol table for every header name it encounters, a malicious
> compressor could cause overruns by sending a high number of junk header
> names. The compressor ought to be able to treat the Name Table as an LRU,
> and ought to be able to place strict limits on it's size, just as it does with the
> Header Table. Delta does not have this issue because it's name indices are
> already tied to the LRU.

True, some kind of mechanism should be added to prevent memory overrun of the Name Table. A simple one is to simply limit the number of entries that can be added to the table.

> With HeaderDiff in mind, I'm thinking about how we can bring HeaderDiff and
> Delta together and roll in the additional concepts of Typed Codecs. Here's
> what I have in mind.
> 
[snip]

My main concern with your proposal is about doing the LRU on both the encoder and the decoder. 
I'd really like to keep the decoder as simple as possible, and so keeping all the buffer management on the encoder side is a real win for me.
In addition, the asymmetry means that the encoder is free to do whatever buffer management it wants. LRU is a very good default buffer management scheme, however I think there are cases where some clever scheme could beat it.

> For ISO-8859-1 Text, the Static Huffman Code used by Delta would be used
> for the value. If we can develop an approach to effectively handling Huffman
> coding for arbitrary UTF-8, then we can apply Huffman coding to that as well.
> 
> For the Number and DateTime serialization:
> 
>   - 16-bit numbers serialize with a strict maximum of 3 bytes
>   - 32-bit numbers serialize with a strict maximum of 5 bytes.
>   - 64-bit numbers serialize with a strict maximum of 10 bytes.
>   - Date Times will serialize with five bytes for the reasonably relevant future,
> then six bytes for quite some time after that. Dates prior to the epoch cannot
> be represented.
> 
> In order to properly deal with the backwards compatibility concerns for
> HTTP/1, there are several important rules for use of Typed Codecs in HTTP
> headers:
> 
> 1. Headers must be explicitly defined to use the new header types. All
> existing HTTP/1 headers, then, will continue to be required to be
> represented as ISO-8859-1 Text unless their standard definitions are
> updated. The HTTP/2 specification would update the definition of specific
> known headers (e.g. content-length, date, if-modified-since, etc).
> 
> 2. Extension headers that use the typed codecs will have specific normative
> transformations to ISO-8859-1 defined.
>     a. UTF-8 Text will be converted to ISO-8859-1 with extended characters
> pct-encoded
>     b. Numbers will be converted to their ASCII equivalent values.
>     c. Date Times will be converted to their HTTP-Date equivalent values.
>     d. Binary fields will be Base64-encoded.
> 
> 3. There will be no normative transformation from ISO-8859-1 values into the
> typed codecs. Implementations are free to apply transformation where
> those impls determine it is appropriate, but it will be perfectly legal for an
> implementation to pass a text value through even if it is known that a given
> header type has a typed codec equivalent (for instance, Content-Length may
> come through as a number or a text value, either will be valid). This means
> that when translating from HTTP/1 -> HTTP/2, receiving implementations
> need to be prepared to handle either value form.

I really like your approach to typed codecs. And while it has been ruled out for the first implementation draft, I think it's useful to pursue as an improvement for a further version of the header compression mechanism. 
I'm willing to do experimentation on it.

Hervé.

> This approach combines what I feel are the best concepts from HeaderDiff
> and Delta and provides simple, straightforward header serialization and state
> management. Obviously, lots of testing is required, however.
> 
> As always, thoughts/opinions/gripes are appreciated.
> 
> - James
Received on Monday, 15 April 2013 15:29:13 UTC