Re: Alternative Header Compression Update.. from James M Snell on 2013-07-10 (ietf-http-wg@w3.org from July to September 2013)

From: James M Snell <jasnell@gmail.com>
Date: Wed, 10 Jul 2013 09:05:53 -0700
To: Michael Sweet <msweet@apple.com>
Cc: "ietf-http-wg@w3.org" <ietf-http-wg@w3.org>
Message-ID: <CABP7Rbey4Y1QxqG3WAkMPVwkACwa0Oasr3SUynr_qJwoOWimvw@mail.gmail.com>
On Wed, Jul 10, 2013 at 6:09 AM, Michael Sweet <msweet@apple.com> wrote:
> James,
>
[snip]
>
> Actually, I think the typed encoding will probably yield enough savings to offset any increase in size, for example the date header going from 29 octets to ~6 in the variable integer encoding (or 8/11 for the RFC 2579 encoding - see below).
>

The type codecs definitely save quite a bit with highly variable
header fields that tend to get sent as literals often (date,
last-modified, content-length, etc). For fairly static header fields,
the difference is minimal for any individual header. Where you start
to see the gap widen is over long running connections, where the
average number of bits on the wire trends higher. It's worth it,
however, IMHO.

> I like the 256-entry single header table approach.  Of course, I have some feedback... :)
>
> 1. Would be nice if you could just include the Unsigned Variable Length Integer Syntax section from the other header compression draft wholesale so this draft stands on its own. Add a notice at the beginning "(This is copied from draft-ietf-httpbis-header-compression-NN)" so people know it is the same encoding.  Then the reference to it becomes informative.
>
> 2. Would also be nice to use the same figure format as the compression and http2 drafts... (see below)
>

Will do both in the next iteration.

> 3. Representing timestamps as milliseconds since the traditional UNIX epoch is problematic since it requires support for large integers (at least 42 bits to get us to the traditional 2038 end year, more if you want to keep going past then...) and AFAIK isn't widely used in standards for actual representation of a date/time. RFC 2579 defines a DateAndTime format that is 8 (UTC) or 11 (local time) octets long and is easy to map to/from typical OS APIs without the use of large integers. Granted, it doesn't give you more than 10ths of seconds, but I think that should be enough for HTTP. (We use this format in IPP - I'd rename "Timestamp" to "DateAndTime" if you decide to make this change...)
>

Millisecond precision has been on the HTTP wish-lists of many
application developers for a very long time, including mine and I
believe the additional requirements are worth it. That said, I've been
considering an alternative approach that is based on a single byte era
prefix. This would encode the timestamp into two parts, a 8-bit prefix
followed by a uvarint <= (2^32)-1. For now, I just picked a format
that would work, with the intent of revisiting it once typed codecs
come up for formal discussion after the august interop event.

> 4. I'm not super-keen on grouping the headers into 4 bins ahead of time, since that increases encoder storage requirements.  Also, there isn't a way to just replace the value for an existing indexed header in your current draft.  Perhaps a hybrid approach where the indexed representation can have 1-to-64 indexes and the others encode a single name/value?  Something like this:
>

A small amount of buffering is required with the grouping approach,
but the encoder controls how much. An encoder that chooses less
buffering would see a bit more encoding overhead. An encoder could
choose to encode each header individually, without grouping, at the
cost of one additional octet per header.

The Indexed Literal Replacement can be used to replace just the value
of an existing header... for instance, suppose I have an existing
entry at position #1 with name="foo", value=1. I want to keep the same
name but replace the value with 2, I would send:

  C0 01 20 01 02

The first octet identifies this as an Indexed Literal Replacement
group with one item.
The second octet identifies the Index position being replaced
The third specifies that the value is an Integer, with the five-least
significant bits set to zero, indicating that a name index reference
is provided by the fourth octet
The fourth octet is the name index reference. We're pointing the index
#01 (the same index that's being replaced)
The fifth octet provides the new value.

Because the name resolution is done before the replacement, the name
is reused and just the value is replaced.

- James
Received on Wednesday, 10 July 2013 16:06:41 UTC