Re: SPDY Header Frames from Roland Zink on 2012-07-13 (ietf-http-wg@w3.org from July to September 2012)

From: Roland Zink <roland@zinks.de>
Date: Sat, 14 Jul 2012 01:36:05 +0200
To: ietf-http-wg@w3.org
Message-ID: <5000B0E5.9000103@zinks.de>
A similar binary encoding for HTTP headers is used in the WAP (Wireless 
Application Protocol). The goal there was to reduce the number of bytes 
so there are more optimization on size but skipping of values is a bit 
more difficult. If anybody is interested the full specification can be 
found at 
http://www.openmobilealliance.org/tech/affiliates/wap/wap-230-wsp-20010705-a.pdf. 
The header encoding is in section 8.4.

Regards,
Roland


On 14.07.2012 00:16, James M Snell wrote:
> This note is intended to provide some additional thoughts for 
> discussion around the design and use of SPDY as the possible basis for 
> HTTP/2.0. The intent is to provide fuel for discussion... comments are 
> definitely welcome.
>
> As discussed within draft-tarreau-httpbis-network-friendly-00, and as 
> has been mentioned several times in discussion on list, handling of 
> headers within the current SPDY framing, and in particular the 
> layering of HTTP/1.1 messages into SPDY frames is less than optimal. 
> There is significant wasted space, duplication, etc that -- strictly 
> speaking -- really isn't necessary. While I recognize that the 
> following increases the basic complexity of the protocol, it allows 
> fairly significant optimization following the same basic lines of 
> reasoning expressed in draft-tarreau-httpbis-network-friendly-00.
>
> Section 2.6.1 of the SPDY draft defines header blocks using the 
> following format:
>
>    +------------------------------------+
>    | Number of Name/Value pairs (int32) |
>    +------------------------------------+
>    |     Length of name (int32)         |
>    +------------------------------------+
>    |           Name (string)            |
>    +------------------------------------+
>    |     Length of value  (int32)       |
>    +------------------------------------+
>    |          Value   (string)          |
>    +------------------------------------+
>    |           (repeats)                |
>
> This structure is used within SYN_STREAM and HEADERS frames.
>
> What I propose is the following revised structure:
>
>    +------------------------------------+
>    |     Number of Headers (int32)      |
>    +------------------------------------+
>    |T| Flags (7) |     Length (24)      |
>    +------------------------------------+
>    |              Data                  |
>    +------------------------------------+
>    |T| Flags (7) |     Length (24)      |
>    +-------------------------------------
>    |              Data                  |
>    +-------------------------------------
>    |             (repeats)              |
>
> T is a single bit identifying the Header Type. There are two types.. 
> REGISTERED (0) and EXTENSION (1)
>
> Flags provides flags for the specific header field. The flag 0x1 
> indicates that the header value contains Character Data. If not set, 
> the value is assumed to consist of raw octets. 0x2 indicates that the 
> value is compressed.
>
> Length is an unsigned 24-bit value specifying the number of octets 
> after the length field.
>
> When the T bit is NOT set, the Header field is a REGISTERED Header, 
> the structure of which is:
>
>    +------------------------------------+
>    |0| Flags (7) |     Length (24)      |
>    +------------------------------------+
>    | ID | Value Length (int32) |Value...|
>    +------------------------------------+
>
> The ID is a 32-bit number uniquely identifying the registered field. 
> Each is assigned by the registrar. For instance, the "Host" field 
> could have a registered value of "1", the "Accept-Lang" field could 
> have a registered value of "6", and so forth.
>
> The Value Length is a 32-bit value indicating the length of the value.
>
> If Flag 0x1 is set, the value is assumed to contain character data. 
> When set, the value MUST be preceded by a single unsigned 8-bit 
> integer identifying the character encoding utilized. The values are 
> assigned by the registrar. For instance, US-ASCII could have a 
> registered value of "1", while "UTF-8" could have a registered value 
> of "2".
>
> For example:
>
>    +------------------------------------+
>    |0| 0000001 |     24                 |
>    +------------------------------------+
>    | 1 | 16 | 1 | www.example.org <http://www.example.org>    |
>    +------------------------------------+
>
> This Header record indicates a REGISTERED header containing character 
> content, the header ID = 1, the charset used is US-ASCII and the value 
> is "www.example.org <http://www.example.org>". The header is expressed 
> with a total of 28 bytes.
>
> When the T bit IS set, the Header field is an EXTENSION Header, the 
> structure of which is:
>
>    +------------------------------------+
>    |0| Flags (7) |     Length (24)      |
>    +------------------------------------+
>    |      Length of name (int32)        |
>    +------------------------------------+
>    |           Name (string)            |
>    +------------------------------------+
>    |      Length of value (int32)       |
>    +------------------------------------+
>    |              Value                 |
>    +------------------------------------+
>
> For example.. an extension header that contains raw binary data...
>
>  +------------------------------------+
>    |0| 0000000 |       Length (24)      |
>  +------------------------------------+
>    |      5                   |
>  +------------------------------------+
>    |    x-foo                 |
>    +------------------------------------+
>    |      4                   |
>  +------------------------------------+
>    | {raw bytes}              |
>    +------------------------------------+
>
> The header is expressed with a total of 21 bytes.
>
> The same flags apply. 0x1 indicates that the value is character data. 
> If 0x1 is not set, the value contains raw octets. The key difference 
> is that there is a 32-bit name length and variable length name field 
> in place of the 32-bit ID field in the REGISTERED header. All other 
> details remain the same.
>
> As is currently the case in SPDY, if a single header value contains 
> multiple values, each can be separated using a single NUL (0) byte.
>
> There are several advantages to this approach:
>
> 1. Commonly used header names are omitted in favor of registered, 
> known numeric IDs, saving space and making it more efficient to scan 
> over commonly used headers. For instance, intermediaries that route 
> requests based on common headers such as Host etc could choose to 
> ignore EXTENSION header fields entirely, and scan only for the ID's of 
> the fields they are interested in, rather than having to parse the 
> entire bag of header names.
>
> 2. Header values can be expressed as raw octets or character data. 
> Currently, mechanisms within HTTP require developers to muck around 
> with Base64 encoding or other encodings when including detail within a 
> header. This approach would eliminate that extra step. For instance, 
> if I wanted to have a Content-Integrity header whose value is an hmac 
> digest, I would be able to drop the raw bytes of the digest into the 
> header value rather than base64 or hex encoding it into an ASCII 
> string, saving CPU cycles and reducing the amount of data that must be 
> transmitted.
>
> 3. Header values that contain character data would not be limited to 
> US-ASCII. Multiple charset encodings would be allowed... obviously 
> this has a whole slew of issues associated with it that need to be 
> carefully considered. The charset encoding flag could be dropped, if 
> necessary, from this proposal.
>
> For HTTP/1.1 Compatibility, each REGISTERED Header would be mapped to 
> a known, registered HTTP/1.1 header, allowing one to one translation 
> from the optimized form to the HTTP/1.1 form. Binary values would be 
> base64-encoded. If a particular header does not allow for Base64 
> encoded values under HTTP/1.1, the down-level recipient would have the 
> option of responding with an appropriate 404 response.
>
> That's it for now. There are additional considerations to be given to 
> the specific selection of header fields to include within the 
> SYN_STREAM vs. follow-on HEADERS frames but that's a separate 
> conversation. As always, feedback is welcome...
>
> - James
>
Received on Friday, 13 July 2012 23:36:31 UTC