W3C home > Mailing lists > Public > ietf-http-wg@w3.org > July to September 2013

RE: Type codecs within hpack

From: Mike Bishop <Michael.Bishop@microsoft.com>
Date: Fri, 23 Aug 2013 21:32:35 +0000
To: James M Snell <jasnell@gmail.com>, "ietf-http-wg@w3.org" <ietf-http-wg@w3.org>
Message-ID: <c57ffb4bb9374e75826e117210194d35@BY2PR03MB025.namprd03.prod.outlook.com>
Integer and Timestamp are defined with the same leading three bits as well.  I assume that's a typo.

-----Original Message-----
From: James M Snell [mailto:jasnell@gmail.com] 
Sent: Friday, August 23, 2013 9:00 AM
To: ietf-http-wg@w3.org
Subject: Type codecs within hpack

With the assumption that hpack is what we'll ultimately end up sticking with for header encoding, I wanted to take a moment to illustrate how the binary type codecs would look within the hpack encoding... (note, this email is **NOT** discussing the alternative header compression I describe in my separate I-D.. this is talking about applying the value type encodings to hpack).

It would be very helpful if the various implementers would give some kind of indication about whether they'd be willing to implement these encodings and, if so, when.

First, a quick note: The type codecs would be defined independently of the compression mechanism itself... that is, just as Roberto's been wanting, we can separate header value type details out of hpack entirely so that hpack can deal specifically with header compression.

Specifically, hpack would be updated such that it would not define a specific encoding for header field values. As far as the basic hpack mechanism is concerned, the value encoding would be an opaque octet sequence.

Literal Header without Indexing (Indexed Name):

     0   1   2   3   4   5   6   7
   +---+---+---+---+---+---+---+---+
   | 0 | 1 | 1 |    Index (5+)     |
   +---+---+---+-------------------+
   |      Value Encoding (8+)      |
   +-------------------------------+

Literal Header without Indexing (New Name):

     0   1   2   3   4   5   6   7
   +---+---+---+---+---+---+---+---+
   | 0 | 1 | 1 |         0         |
   +---+---+---+-------------------+
   |       Name Length (8+)        |
   +-------------------------------+
   |  Name String (Length octets)  |
   +-------------------------------+
   |      Value Encoding (8+)      |
   +-------------------------------+

Literal Header with Incremental Indexing (Indexed Name):

     0   1   2   3   4   5   6   7
   +---+---+---+---+---+---+---+---+
   | 0 | 1 | 0 |    Index (5+)     |
   +---+---+---+-------------------+
   |      Value Encoding (8+)      |
   +-------------------------------+

Literal Header with Incremental Indexing (New Name):

     0   1   2   3   4   5   6   7
   +---+---+---+---+---+---+---+---+
   | 0 | 1 | 0 |         0         |
   +---+---+---+-------------------+
   |       Name Length (8+)        |
   +-------------------------------+
   |  Name String (Length octets)  |
   +-------------------------------+
   |      Value Encoding (8+)      |
   +-------------------------------+

Literal Header with Substitution Indexing (Indexed Name):

     0   1   2   3   4   5   6   7
   +---+---+---+---+---+---+---+---+
   | 0 | 0 |      Index (6+)       |
   +---+---+-----------------------+
   |    Substituted Index (8+)     |
   +-------------------------------+
   |      Value Encoding (8+)      |
   +-------------------------------+

Literal Header with Substitution Indexing (New Name):

     0   1   2   3   4   5   6   7
   +---+---+---+---+---+---+---+---+
   | 0 | 0 |           0           |
   +---+---+-----------------------+
   |       Name Length (8+)        |
   +-------------------------------+
   |  Name String (Length octets)  |
   +-------------------------------+
   |    Substituted Index (8+)     |
   +-------------------------------+
   |      Value Encoding (8+)      |
   +-------------------------------+


There are five possible value encodings which manifest two basic patterns... The value types are:

1. UTF-8
2. Legacy
3. Opaque
4. Integer
5. Timestamp

On the wire these look like:

UTF-8
     0   1   2   3   4   5   6   7
   +---+---+---+---+---+---+---+---+
   | 0 | 0 | 0 | Value Length (5+) |
   +---+---+---+-------------------+
   | Value String (Length Octets)  |
   +-------------------------------+

Integer

     0   1   2   3   4   5   6   7
   +---+---+---+---+---+---+---+---+
   | 0 | 0 | 1 |    Value (5+)     |
   +---+---+---+---+---+---+---+---+

Timestamp

     0   1   2   3   4   5   6   7
   +---+---+---+---+---+---+---+---+
   | 0 | 0 | 1 |    Value (5+)     |
   +---+---+---+---+---+---+---+---+

Legacy
     0   1   2   3   4   5   6   7
   +---+---+---+---+---+---+---+---+
   | 1 | 0 | 0 | Value Length (5+) |
   +-------------------------------+
   | Value String (Length Octets)  |
   +-------------------------------+

Opaque
     0   1   2   3   4   5   6   7
   +---+---+---+---+---+---+---+---+
   | 1 | 1 | 1 | Value Length (5+) |
   +-------------------------------+
   | Value Octets (Length Octets)  |
   +-------------------------------+


Notice that the UTF-8, Legacy and Opaque type encodings are identical other than the leading three-bits. Likewise, the Integer and Timestamp encodings are identical other than the leading three bits.

For UTF-8, Legacy and Opaque, the Value Length is encoding as an integer with a 5-bit prefix.

For Integer, the Value is encoded as an integer with a 5-bit prefix.
Negative or fractional values cannot be represented. Theoretically there is no upper value limit to this encoding, however, it would likely be good to recommend that only values up to 64-bits are encoded.

For Timestamp, the number of milliseconds since the epoch is encoded as an integer with a 5-bit prefix. Dates before the epoch cannot be represented.

If we put these types together with hpack, this is what we end up with:

Literal Header without Indexing (Indexed Name), UTF-8 Value

     0   1   2   3   4   5   6   7
   +---+---+---+---+---+---+---+---+
   | 0 | 1 | 1 |    Index (5+)     |
   +---+---+---+---+---+---+---+---+
   | 0 | 0 | 0 | Value Length (5+) |
   +---+---+---+-------------------+
   | Value String (Length Octets)  |
   +-------------------------------+

Literal Header without Indexing (Indexed Name), Integer Value

     0   1   2   3   4   5   6   7
   +---+---+---+---+---+---+---+---+
   | 0 | 1 | 1 |    Index (5+)     |
   +---+---+---+---+---+---+---+---+
   | 0 | 0 | 1 |    Value (5+)     |
   +---+---+---+---+---+---+---+---+

Literal Header without Indexing (Indexed Name), Timestamp Value

     0   1   2   3   4   5   6   7
   +---+---+---+---+---+---+---+---+
   | 0 | 1 | 1 |    Index (5+)     |
   +---+---+---+---+---+---+---+---+
   | 0 | 1 | 0 |    Value (5+)     |
   +---+---+---+---+---+---+---+---+

Literal Header without Indexing (Indexed Name), Legacy Value

     0   1   2   3   4   5   6   7
   +---+---+---+---+---+---+---+---+
   | 0 | 1 | 1 |    Index (5+)     |
   +---+---+---+---+---+---+---+---+
   | 1 | 0 | 0 | Value Length (5+) |
   +---+---+---+-------------------+
   | Value String (Length Octets)  |
   +-------------------------------+

Literal Header without Indexing (Indexed Name), Opaque Value

     0   1   2   3   4   5   6   7
   +---+---+---+---+---+---+---+---+
   | 0 | 1 | 1 |    Index (5+)     |
   +---+---+---+---+---+---+---+---+
   | 1 | 1 | 1 | Value Length (5+) |
   +---+---+---+-------------------+
   | Value Octets (Length Octets)  |
   +-------------------------------+

Literal Header without Indexing (New Name), UTF-8 Value

     0   1   2   3   4   5   6   7
   +---+---+---+---+---+---+---+---+
   | 0 | 1 | 1 |         0         |
   +---+---+---+-------------------+
   |       Name Length (8+)        |
   +-------------------------------+
   |  Name String (Length octets)  |
   +---+---+---+---+---+---+---+---+
   | 0 | 0 | 0 | Value Length (5+) |
   +---+---+---+-------------------+
   | Value String (Length Octets)  |
   +-------------------------------+

Literal Header without Indexing (New Name), Integer Value

     0   1   2   3   4   5   6   7
   +---+---+---+---+---+---+---+---+
   | 0 | 1 | 1 |         0         |
   +---+---+---+-------------------+
   |       Name Length (8+)        |
   +-------------------------------+
   |  Name String (Length octets)  |
   +---+---+---+---+---+---+---+---+
   | 0 | 0 | 1 |    Value (5+)     |
   +---+---+---+---+---+---+---+---+

Literal Header without Indexing (New Name), Timestamp Value

     0   1   2   3   4   5   6   7
   +---+---+---+---+---+---+---+---+
   | 0 | 1 | 1 |         0         |
   +---+---+---+-------------------+
   |       Name Length (8+)        |
   +-------------------------------+
   |  Name String (Length octets)  |
   +---+---+---+---+---+---+---+---+
   | 0 | 1 | 0 |    Value (5+)     |
   +---+---+---+---+---+---+---+---+

Literal Header without Indexing (New Name), Legacy Value

     0   1   2   3   4   5   6   7
   +---+---+---+---+---+---+---+---+
   | 0 | 1 | 1 |         0         |
   +---+---+---+-------------------+
   |       Name Length (8+)        |
   +-------------------------------+
   |  Name String (Length octets)  |
   +---+---+---+---+---+---+---+---+
   | 1 | 0 | 0 | Value Length (5+) |
   +---+---+---+-------------------+
   | Value String (Length Octets)  |
   +-------------------------------+

Literal Header without Indexing (New Name), Opaque Value

     0   1   2   3   4   5   6   7
   +---+---+---+---+---+---+---+---+
   | 0 | 1 | 1 |         0         |
   +---+---+---+-------------------+
   |       Name Length (8+)        |
   +-------------------------------+
   |  Name String (Length octets)  |
   +---+---+---+---+---+---+---+---+
   | 1 | 1 | 1 | Value Length (5+) |
   +---+---+---+-------------------+
   | Value Octets (Length Octets)  |
   +-------------------------------+

Literal Header with Incremental Indexing (Indexed Name), UTF-8 Value

     0   1   2   3   4   5   6   7
   +---+---+---+---+---+---+---+---+
   | 0 | 1 | 0 |    Index (5+)     |
   +---+---+---+---+---+---+---+---+
   | 0 | 0 | 0 | Value Length (5+) |
   +---+---+---+-------------------+
   | Value String (Length Octets)  |
   +-------------------------------+

Literal Header with Incremental Indexing (Indexed Name), Integer Value

     0   1   2   3   4   5   6   7
   +---+---+---+---+---+---+---+---+
   | 0 | 1 | 0 |    Index (5+)     |
   +---+---+---+---+---+---+---+---+
   | 0 | 0 | 1 |    Value (5+)     |
   +---+---+---+---+---+---+---+---+

Literal Header with Incremental Indexing (Indexed Name), Timestamp Value

     0   1   2   3   4   5   6   7
   +---+---+---+---+---+---+---+---+
   | 0 | 1 | 0 |    Index (5+)     |
   +---+---+---+---+---+---+---+---+
   | 0 | 1 | 0 |    Value (5+)     |
   +---+---+---+---+---+---+---+---+

Literal Header with Incremental Indexing (Indexed Name), Legacy Value

     0   1   2   3   4   5   6   7
   +---+---+---+---+---+---+---+---+
   | 0 | 1 | 0 |    Index (5+)     |
   +---+---+---+---+---+---+---+---+
   | 1 | 0 | 0 | Value Length (5+) |
   +---+---+---+-------------------+
   | Value String (Length Octets)  |
   +-------------------------------+

Literal Header with Incremental Indexing (Indexed Name), Opaque Value

     0   1   2   3   4   5   6   7
   +---+---+---+---+---+---+---+---+
   | 0 | 1 | 0 |    Index (5+)     |
   +---+---+---+---+---+---+---+---+
   | 1 | 1 | 1 | Value Length (5+) |
   +---+---+---+-------------------+
   | Value Octets (Length Octets)  |
   +-------------------------------+

Literal Header with Incremental Indexing (New Name), UTF-8 Value

     0   1   2   3   4   5   6   7
   +---+---+---+---+---+---+---+---+
   | 0 | 1 | 0 |         0         |
   +---+---+---+-------------------+
   |       Name Length (8+)        |
   +-------------------------------+
   |  Name String (Length octets)  |
   +---+---+---+---+---+---+---+---+
   | 0 | 0 | 0 | Value Length (5+) |
   +---+---+---+-------------------+
   | Value String (Length Octets)  |
   +-------------------------------+

Literal Header with Incremental Indexing (New Name), Integer Value

     0   1   2   3   4   5   6   7
   +---+---+---+---+---+---+---+---+
   | 0 | 1 | 0 |         0         |
   +---+---+---+-------------------+
   |       Name Length (8+)        |
   +-------------------------------+
   |  Name String (Length octets)  |
   +---+---+---+---+---+---+---+---+
   | 0 | 0 | 1 |    Value (5+)     |
   +---+---+---+---+---+---+---+---+

Literal Header with Incremental Indexing (New Name), Timestamp Value

     0   1   2   3   4   5   6   7
   +---+---+---+---+---+---+---+---+
   | 0 | 1 | 0 |         0         |
   +---+---+---+-------------------+
   |       Name Length (8+)        |
   +-------------------------------+
   |  Name String (Length octets)  |
   +---+---+---+---+---+---+---+---+
   | 0 | 1 | 0 |    Value (5+)     |
   +---+---+---+---+---+---+---+---+

Literal Header with Incremental Indexing (New Name), Legacy Value

     0   1   2   3   4   5   6   7
   +---+---+---+---+---+---+---+---+
   | 0 | 1 | 0 |         0         |
   +---+---+---+-------------------+
   |       Name Length (8+)        |
   +-------------------------------+
   |  Name String (Length octets)  |
   +---+---+---+---+---+---+---+---+
   | 1 | 0 | 0 | Value Length (5+) |
   +---+---+---+-------------------+
   | Value String (Length Octets)  |
   +-------------------------------+


Literal Header with Incremental Indexing (Indexed Name), Opaque Value

     0   1   2   3   4   5   6   7
   +---+---+---+---+---+---+---+---+
   | 0 | 1 | 0 |         0         |
   +---+---+---+-------------------+
   |       Name Length (8+)        |
   +-------------------------------+
   |  Name String (Length octets)  |
   +---+---+---+---+---+---+---+---+
   | 1 | 1 | 1 | Value Length (5+) |
   +---+---+---+-------------------+
   | Value Octets (Length Octets)  |
   +-------------------------------+


Literal Header with Substitution Indexing (Indexed Name), UTF-8 Value

     0   1   2   3   4   5   6   7
   +---+---+---+---+---+---+---+---+
   | 0 | 0 |      Index (6+)       |
   +---+---+-----------------------+
   |    Substituted Index (8+)     |
   +---+---+---+---+---+---+---+---+
   | 0 | 0 | 0 | Value Length (5+) |
   +---+---+---+-------------------+
   | Value String (Length Octets)  |
   +-------------------------------+

Literal Header with Substitution Indexing (Indexed Name), Integer Value

     0   1   2   3   4   5   6   7
   +---+---+---+---+---+---+---+---+
   | 0 | 0 |      Index (6+)       |
   +---+---+-----------------------+
   |    Substituted Index (8+)     |
   +---+---+---+---+---+---+---+---+
   | 0 | 0 | 1 |    Value (5+)     |
   +---+---+---+---+---+---+---+---+

Literal Header with Substitution Indexing (Indexed Name), Timestamp Value

     0   1   2   3   4   5   6   7
   +---+---+---+---+---+---+---+---+
   | 0 | 0 |      Index (6+)       |
   +---+---+-----------------------+
   |    Substituted Index (8+)     |
   +---+---+---+---+---+---+---+---+
   | 0 | 1 | 0 |    Value (5+)     |
   +---+---+---+---+---+---+---+---+

Literal Header with Substitution Indexing (Indexed Name), Legacy Value

     0   1   2   3   4   5   6   7
   +---+---+---+---+---+---+---+---+
   | 0 | 0 |      Index (6+)       |
   +---+---+-----------------------+
   |    Substituted Index (8+)     |
   +---+---+---+---+---+---+---+---+
   | 1 | 0 | 0 | Value Length (5+) |
   +---+---+---+-------------------+
   | Value String (Length Octets)  |
   +-------------------------------+

Literal Header with Substitution Indexing (Indexed Name), Opaque Value

     0   1   2   3   4   5   6   7
   +---+---+---+---+---+---+---+---+
   | 0 | 0 |      Index (6+)       |
   +---+---+-----------------------+
   |    Substituted Index (8+)     |
   +---+---+---+---+---+---+---+---+
   | 1 | 1 | 1 | Value Length (5+) |
   +---+---+---+-------------------+
   | Value Octets (Length Octets)  |
   +-------------------------------+


Literal Header with Substitution Indexing (New Name), UTF-8 Value

     0   1   2   3   4   5   6   7
   +---+---+---+---+---+---+---+---+
   | 0 | 0 |           0           |
   +---+---+---+-------------------+
   |       Name Length (8+)        |
   +-------------------------------+
   |  Name String (Length octets)  |
   +---+---+-----------------------+
   |    Substituted Index (8+)     |
   +---+---+---+---+---+---+---+---+
   | 0 | 0 | 0 | Value Length (5+) |
   +---+---+---+-------------------+
   | Value String (Length Octets)  |
   +-------------------------------+

Literal Header with Substitution Indexing (New Name), Integer Value

     0   1   2   3   4   5   6   7
   +---+---+---+---+---+---+---+---+
   | 0 | 0 |           0           |
   +---+---+---+-------------------+
   |       Name Length (8+)        |
   +-------------------------------+
   |  Name String (Length octets)  |
   +---+---+-----------------------+
   |    Substituted Index (8+)     |
   +---+---+---+---+---+---+---+---+
   | 0 | 0 | 1 |    Value (5+)     |
   +---+---+---+---+---+---+---+---+

Literal Header with Substitution Indexing (New Name), Timestamp Value

     0   1   2   3   4   5   6   7
   +---+---+---+---+---+---+---+---+
   | 0 | 0 |           0           |
   +---+---+---+-------------------+
   |       Name Length (8+)        |
   +-------------------------------+
   |  Name String (Length octets)  |
   +---+---+-----------------------+
   |    Substituted Index (8+)     |
   +---+---+---+---+---+---+---+---+
   | 0 | 1 | 0 |    Value (5+)     |
   +---+---+---+---+---+---+---+---+

Literal Header with Substitution Indexing (New Name), Legacy Value

     0   1   2   3   4   5   6   7
   +---+---+---+---+---+---+---+---+
   | 0 | 0 |           0           |
   +---+---+---+-------------------+
   |       Name Length (8+)        |
   +-------------------------------+
   |  Name String (Length octets)  |
   +---+---+-----------------------+
   |    Substituted Index (8+)     |
   +---+---+---+---+---+---+---+---+
   | 1 | 0 | 0 | Value Length (5+) |
   +---+---+---+-------------------+
   | Value String (Length Octets)  |
   +-------------------------------+

Literal Header with Substitution Indexing (New Name), Opaque Value

     0   1   2   3   4   5   6   7
   +---+---+---+---+---+---+---+---+
   | 0 | 0 |           0           |
   +---+---+---+-------------------+
   |       Name Length (8+)        |
   +-------------------------------+
   |  Name String (Length octets)  |
   +---+---+-----------------------+
   |    Substituted Index (8+)     |
   +---+---+---+---+---+---+---+---+
   | 1 | 1 | 1 | Value Length (5+) |
   +---+---+---+-------------------+
   | Value Octets (Length Octets)  |
   +-------------------------------+

For backwards compatibility with HTTP/1, the following conversion rules would apply when translating an HTTP/2 request into HTTP/1

1. UTF-8 value encodings is converted to ASCII, with all non-ASCII codepoints pct-encoded.
2. Legacy value encodings are passed through without any translation 3. Opaque values are Base64 encoded 4. Integer values are converted into the ASCII representation of the number (e.g. 1234 = "1234") 5. Timestamp values are converted to the HTTP-date equivalent string, with millisecond precision lost.

There is no normative translation from HTTP/1 to HTTP/2. All HTTP/1 header field values would be passed without translation using the Legacy Value Encoding.

Header fields would have to be specifically defined at the http layer to use the new value encodings. The definitions of existing HTTP/1 header fields remain the same unless those headers are specifically redefined.

Some of the existing standard headers that we know can be redefined include:

* :status (integer)
* :host (legacy or utf-8)
* :path (legacy or utf-8)
* content-length (legacy or integer)
* date (legacy or timestamp)
* max-forwards (legacy or integer)
* retry-after (legacy, integer or timestamp)
* if-modified-since (legacy or timestamp)
* if-unmodified-since (legacy or timestamp)
* last-modified (legacy or timestamp)
* age (legacy or integer)
* expires (legacy, integer or timestamp)
* etag (legacy or opaque... an opaque etag is always "strong", never "weak")

There is a question about whether we really need to have any distinction between UTF-8, Legacy and Opaque. First, it's important to note that, with the exception of the first three bits, these three types serialize exactly the same on the wire and are handled in exactly the same way by the header compression mechanism. The type distinction comes into play only when we want to use the value.

Allowing for UTF-8 gives developers the ability to use extended characters.. for instance, we could use IRI's directly within our requests without any translation to ASCII URI's. That means, passing :host with an IDN rather than punycode equivalent; and passing :path with extended characters without being required to do additional pct-encoding.

Allowing for Legacy gives us a fallback for HTTP/1 compatibility.
There are very few enforced rules in HTTP/1 for values and we need to have a way of indicating that the value being passed through is untouched relative to HTTP/1 and follows all the same rules at HTTP/1.

Allowing for Opaque values gives us the opportunity to achieve highly efficient encodings for things that, in the past, haven't been encoded very efficiently. It allows us to pass raw binary sequences that ought not to be interpreted as text necessarily.. avoiding the need to apply additional base64 or hex encoding.

Lastly, I know there have been some discussions around structured alternatives for timestamp encoding. Honestly, I've played around with a number of variations and kept coming to the same conclusion:
milliseconds since the epoch encoded as a variable length integer really is good enough. It keeps things simple while delivering a feature many appdevs have been asking for for a very long time.

Received on Friday, 23 August 2013 21:33:07 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 17:14:14 UTC