Re: SPDY Header Frames from James M Snell on 2012-07-17 (ietf-http-wg@w3.org from July to September 2012)

From: James M Snell <jasnell@gmail.com>
Date: Tue, 17 Jul 2012 00:21:11 -0700
To: Mike Belshe <mike@belshe.com>
Cc: ietf-http-wg@w3.org
Message-ID: <CABP7RbcxomO3oZFpNNthj2if89LtkbKeR6D3_pYvf3drCKVtfA@mail.gmail.com>
On Tue, Jul 17, 2012 at 12:18 AM, Mike Belshe <mike@belshe.com> wrote:

> i like the direction of this.  a good blend of a registry and extension
> headers.
>
> i wasn't quite sure how you got to the 6 byte compressed header, tho.
>
>
Heh.. chalk that up to temporary idiocy on my part... and a complete lack
of either caffeine or alcohol in my system.


> mike
>
>
> On Mon, Jul 16, 2012 at 11:51 PM, James M Snell <jasnell@gmail.com> wrote:
>
>> Ok... spent some time this evening playing around with the header frame
>> syntax a bit more to see what further optimizations could be made and to
>> see if the binary encoded header id's made any noticeable difference in
>> size and ease of processing.
>>
>> Here's the revised structure I played around with...
>>
>> 1. Within a HEADER block, I assume two possible types of headers,
>> REGISTERED and EXTENSION. A REGISTERED header is one that would be known to
>> the registrar and assigned a numeric id and a codepage. If the codepage is
>> 0, the implication is that the header is MUST UNDERSTAND and is considered
>> one of the core headers for the basic operation of the protocol. Codepages
>> 1-9 are MUST-IGNORE... that is, if a user-agent or server comes across a
>> header on these code pages that is not understood, the header can simply be
>> ignored. Codepages 10-14 are PRIVATE USE, with Codepage 10 being reserved
>> for MUST UNDERSTAND PRIVATE USE headers. EXTENSION headers are simple name
>> value pairs essentially as they exist today.
>>
>> 2. Within extension headers, the name portion MUST be ASCII and MUST NOT
>> be longer than 255 bytes (quite generous really).
>>
>> 3. Values may be binary or character based, as indicated by a flags
>> field. Values may be up to max(int32) in length.
>>
>> 4. Registered HTTP Methods can be identified by numeric value. Extension
>> Methods can be identified by character value.
>>
>> 5. The structure for REGISTERED HEADERS is...
>>
>>   +------------------------------+
>>   |0| id (15-bit)| flags(8-bit)  |
>>   +------------------------------+
>>   | len (32-bit) |     value     |
>>   +------------------------------+
>>
>> 6. The structure for EXTENSION HEADERS is...
>>
>>   +------------------------------+
>>   |1| flags(7-bit) | namelen (8) |
>>   +------------------------------+
>>   | name | val len (32) | value  |
>>   +------------------------------+
>>
>> Assuming the following registered headers...
>>
>>   public static final short VERSION = 1;
>>   public static final short METHOD = 2;
>>   public static final short HOST = 3;
>>   public static final short SESSION = 4;
>>   public static final short CHARSET = 5;
>>   public static final short REQUEST_URI = 6;
>>   public static final short ACCEPT_LANG = 4097;
>>
>> And the following registered methods...
>>
>>   public static final byte GET = 1;
>>   public static final byte POST = 2;
>>   public static final byte PUT = 3;
>>   public static final byte DELETE = 4;
>>   public static final byte PATCH = 5;
>>   public static final byte HEAD = 6;
>>   public static final byte OPTIONS = 7;
>>
>> Let's assume that what we want to to encode a HTTP GET for resource:
>>   http://www.example.org/this/is/the/request?is=it&not=beautiful
>>
>> With a session identifier of "session_key", ACCEPT_LANG = en-US and
>> default charset encoding for all character based header values is
>> "US-ASCII"... Let's also add an extension header "ext" with value "foo"...
>>
>> The Version header can be encoded as:
>>   {0, 1, 0, 0, 0, 0, 2, 2, 0}
>>
>> The GET Method header can be encoded as:
>>   {0, 2, 0, 0, 0, 0, 1, 1}
>>
>> The Host header would be encoded as:
>>   {  0,   3,   1,  0,   0,   0,  15, 119, 119, 119,
>>     46, 101, 120, 97, 109, 112, 108, 101,  46, 111,
>>    114, 103}
>>
>> The Accept-Lang header would be encoded as:
>>   {16, 1, 1, 0, 0, 0, 5, 'e', 'n', '-', 'U', 'S'}
>>
>> The Extension header ext: foo would be encoded as:
>>   {-128, 1, 3, 101, 120, 116, 0, 0, 0, 3, 102, 111, 111}
>>
>> The entire header block is encoded into a structure of 145 bytes in
>> length;
>>
>> [8, 0, 1, 0, 0, 0, 0, 2, 2, 0, 0, 2, 0, 0, 0, 0, 1, 1, 0, 3, 1, 0, 0, 0,
>> 15, 119, 119, 119, 46, 101, 120, 97, 109, 112, 108, 101, 46, 111, 114, 103,
>> 0, 6, 1, 0, 0, 0, 40, 47, 116, 104, 105, 115, 47, 105, 115, 47, 116, 104,
>> 101, 47, 114, 101, 113, 117, 101, 115, 116, 63, 105, 115, 61, 105, 116, 38,
>> 110, 111, 116, 61, 98, 101, 97, 117, 116, 105, 102, 117, 108, 0, 5, 1, 0,
>> 0, 0, 8, 117, 115, 45, 97, 115, 99, 105, 105, 0, 4, 1, 0, 0, 0, 11, 115,
>> 101, 115, 115, 105, 111, 110, 95, 107, 101, 121, 16, 1, 1, 0, 0, 0, 5, 101,
>> 110, 45, 85, 83, -128, 1, 3, 101, 120, 116, 0, 0, 0, 3, 102, 111, 111]
>>
>> By comparison, the same structure encoded using the existing SPDY HEADER
>> block would require 208 bytes sans compression.
>>
>> After applying compression of the block using the SPDY dictionary, the
>> block compresses into 6 compact bytes.
>>
>> [120, 63, -29, -58, -89, -62]
>> Assuming this structure was used within a SPDY_STREAM message,
>> unencrypted, a proxy/router that is scanning the headers to determine where
>> to route the SYN_STREAM too would need only to look at the first two bytes
>> of each header to determine if the header is either the HOST, METHOD,
>> REQUEST_URI, VERSION or SESSION identifier. This scheme should prove to be
>> significantly faster to scan and perform operations on than the current
>> all-text-key-pair model. As always, tho, your mileage may vary.
>>
>> /end-experiment
>>
>> - James
>>
>> On Fri, Jul 13, 2012 at 3:16 PM, James M Snell <jasnell@gmail.com> wrote:
>>
>>> This note is intended to provide some additional thoughts for discussion
>>> around the design and use of SPDY as the possible basis for HTTP/2.0. The
>>> intent is to provide fuel for discussion... comments are definitely welcome.
>>>
>>> As discussed within draft-tarreau-httpbis-network-friendly-00, and as
>>> has been mentioned several times in discussion on list, handling of headers
>>> within the current SPDY framing, and in particular the layering of HTTP/1.1
>>> messages into SPDY frames is less than optimal. There is significant wasted
>>> space, duplication, etc that -- strictly speaking -- really isn't
>>> necessary. While I recognize that the following increases the basic
>>> complexity of the protocol, it allows fairly significant optimization
>>> following the same basic lines of reasoning expressed in
>>> draft-tarreau-httpbis-network-friendly-00.
>>>
>>> Section 2.6.1 of the SPDY draft defines header blocks using the
>>> following format:
>>>
>>>    +------------------------------------+
>>>    | Number of Name/Value pairs (int32) |
>>>    +------------------------------------+
>>>    |     Length of name (int32)         |
>>>    +------------------------------------+
>>>    |           Name (string)            |
>>>    +------------------------------------+
>>>    |     Length of value  (int32)       |
>>>    +------------------------------------+
>>>    |          Value   (string)          |
>>>    +------------------------------------+
>>>    |           (repeats)                |
>>>
>>> This structure is used within SYN_STREAM and HEADERS frames.
>>>
>>> What I propose is the following revised structure:
>>>
>>>    +------------------------------------+
>>>    |     Number of Headers (int32)      |
>>>    +------------------------------------+
>>>    |T| Flags (7) |     Length (24)      |
>>>    +------------------------------------+
>>>    |              Data                  |
>>>    +------------------------------------+
>>>    |T| Flags (7) |     Length (24)      |
>>>    +-------------------------------------
>>>    |              Data                  |
>>>    +-------------------------------------
>>>    |             (repeats)              |
>>>
>>> T is a single bit identifying the Header Type. There are two types..
>>> REGISTERED (0) and EXTENSION (1)
>>>
>>> Flags provides flags for the specific header field. The flag 0x1
>>> indicates that the header value contains Character Data. If not set, the
>>> value is assumed to consist of raw octets. 0x2 indicates that the value is
>>> compressed.
>>>
>>> Length is an unsigned 24-bit value specifying the number of octets after
>>> the length field.
>>>
>>> When the T bit is NOT set, the Header field is a REGISTERED Header, the
>>> structure of which is:
>>>
>>>    +------------------------------------+
>>>    |0| Flags (7) |     Length (24)      |
>>>    +------------------------------------+
>>>    | ID | Value Length (int32) |Value...|
>>>    +------------------------------------+
>>>
>>> The ID is a 32-bit number uniquely identifying the registered field.
>>> Each is assigned by the registrar. For instance, the "Host" field could
>>> have a registered value of "1", the "Accept-Lang" field could have a
>>> registered value of "6", and so forth.
>>>
>>> The Value Length is a 32-bit value indicating the length of the value.
>>>
>>> If Flag 0x1 is set, the value is assumed to contain character data. When
>>> set, the value MUST be preceded by a single unsigned 8-bit integer
>>> identifying the character encoding utilized. The values are assigned by the
>>> registrar. For instance, US-ASCII could have a registered value of "1",
>>> while "UTF-8" could have a registered value of "2".
>>>
>>> For example:
>>>
>>>    +------------------------------------+
>>>    |0| 0000001 |     24                 |
>>>    +------------------------------------+
>>>    | 1 | 16 | 1 |    www.example.org    |
>>>    +------------------------------------+
>>>
>>> This Header record indicates a REGISTERED header containing character
>>> content, the header ID = 1, the charset used is US-ASCII and the value is "
>>> www.example.org". The header is expressed with a total of 28 bytes.
>>>
>>> When the T bit IS set, the Header field is an EXTENSION Header, the
>>> structure of which is:
>>>
>>>    +------------------------------------+
>>>    |0| Flags (7) |     Length (24)      |
>>>    +------------------------------------+
>>>    |      Length of name (int32)        |
>>>    +------------------------------------+
>>>    |           Name (string)            |
>>>    +------------------------------------+
>>>    |      Length of value (int32)       |
>>>    +------------------------------------+
>>>    |              Value                 |
>>>    +------------------------------------+
>>>
>>> For example.. an extension header that contains raw binary data...
>>>
>>>    +------------------------------------+
>>>    |0| 0000000 |       Length (24)      |
>>>    +------------------------------------+
>>>    |                5                   |
>>>    +------------------------------------+
>>>    |              x-foo                 |
>>>    +------------------------------------+
>>>    |                4                   |
>>>    +------------------------------------+
>>>    |           {raw bytes}              |
>>>    +------------------------------------+
>>>
>>> The header is expressed with a total of 21 bytes.
>>>
>>> The same flags apply. 0x1 indicates that the value is character data. If
>>> 0x1 is not set, the value contains raw octets. The key difference is that
>>> there is a 32-bit name length and variable length name field in place of
>>> the 32-bit ID field in the REGISTERED header. All other details remain the
>>> same.
>>>
>>> As is currently the case in SPDY, if a single header value contains
>>> multiple values, each can be separated using a single NUL (0) byte.
>>>
>>> There are several advantages to this approach:
>>>
>>> 1. Commonly used header names are omitted in favor of registered, known
>>> numeric IDs, saving space and making it more efficient to scan over
>>> commonly used headers. For instance, intermediaries that route requests
>>> based on common headers such as Host etc could choose to ignore EXTENSION
>>> header fields entirely, and scan only for the ID's of the fields they are
>>> interested in, rather than having to parse the entire bag of header names.
>>>
>>> 2. Header values can be expressed as raw octets or character data.
>>> Currently, mechanisms within HTTP require developers to muck around with
>>> Base64 encoding or other encodings when including detail within a header.
>>> This approach would eliminate that extra step. For instance, if I wanted to
>>> have a Content-Integrity header whose value is an hmac digest, I would be
>>> able to drop the raw bytes of the digest into the header value rather than
>>> base64 or hex encoding it into an ASCII string, saving CPU cycles and
>>> reducing the amount of data that must be transmitted.
>>>
>>> 3. Header values that contain character data would not be limited to
>>> US-ASCII. Multiple charset encodings would be allowed... obviously this has
>>> a whole slew of issues associated with it that need to be carefully
>>> considered. The charset encoding flag could be dropped, if necessary, from
>>> this proposal.
>>>
>>> For HTTP/1.1 Compatibility, each REGISTERED Header would be mapped to a
>>> known, registered HTTP/1.1 header, allowing one to one translation from the
>>> optimized form to the HTTP/1.1 form. Binary values would be base64-encoded.
>>> If a particular header does not allow for Base64 encoded values under
>>> HTTP/1.1, the down-level recipient would have the option of responding with
>>> an appropriate 404 response.
>>>
>>> That's it for now. There are additional considerations to be given to
>>> the specific selection of header fields to include within the SYN_STREAM
>>> vs. follow-on HEADERS frames but that's a separate conversation. As always,
>>> feedback is welcome...
>>>
>>> - James
>>>
>>>
>>
>
Received on Tuesday, 17 July 2012 07:22:05 UTC