- From: James M Snell <jasnell@gmail.com>
- Date: Fri, 13 Jul 2012 15:16:35 -0700
- To: ietf-http-wg@w3.org
- Message-ID: <CABP7RbepWH4ahSPHDU_M_w0tRVz_RRm1FV-jM_Y72=YHCVqO0g@mail.gmail.com>
This note is intended to provide some additional thoughts for discussion around the design and use of SPDY as the possible basis for HTTP/2.0. The intent is to provide fuel for discussion... comments are definitely welcome. As discussed within draft-tarreau-httpbis-network-friendly-00, and as has been mentioned several times in discussion on list, handling of headers within the current SPDY framing, and in particular the layering of HTTP/1.1 messages into SPDY frames is less than optimal. There is significant wasted space, duplication, etc that -- strictly speaking -- really isn't necessary. While I recognize that the following increases the basic complexity of the protocol, it allows fairly significant optimization following the same basic lines of reasoning expressed in draft-tarreau-httpbis-network-friendly-00. Section 2.6.1 of the SPDY draft defines header blocks using the following format: +------------------------------------+ | Number of Name/Value pairs (int32) | +------------------------------------+ | Length of name (int32) | +------------------------------------+ | Name (string) | +------------------------------------+ | Length of value (int32) | +------------------------------------+ | Value (string) | +------------------------------------+ | (repeats) | This structure is used within SYN_STREAM and HEADERS frames. What I propose is the following revised structure: +------------------------------------+ | Number of Headers (int32) | +------------------------------------+ |T| Flags (7) | Length (24) | +------------------------------------+ | Data | +------------------------------------+ |T| Flags (7) | Length (24) | +------------------------------------- | Data | +------------------------------------- | (repeats) | T is a single bit identifying the Header Type. There are two types.. REGISTERED (0) and EXTENSION (1) Flags provides flags for the specific header field. The flag 0x1 indicates that the header value contains Character Data. If not set, the value is assumed to consist of raw octets. 0x2 indicates that the value is compressed. Length is an unsigned 24-bit value specifying the number of octets after the length field. When the T bit is NOT set, the Header field is a REGISTERED Header, the structure of which is: +------------------------------------+ |0| Flags (7) | Length (24) | +------------------------------------+ | ID | Value Length (int32) |Value...| +------------------------------------+ The ID is a 32-bit number uniquely identifying the registered field. Each is assigned by the registrar. For instance, the "Host" field could have a registered value of "1", the "Accept-Lang" field could have a registered value of "6", and so forth. The Value Length is a 32-bit value indicating the length of the value. If Flag 0x1 is set, the value is assumed to contain character data. When set, the value MUST be preceded by a single unsigned 8-bit integer identifying the character encoding utilized. The values are assigned by the registrar. For instance, US-ASCII could have a registered value of "1", while "UTF-8" could have a registered value of "2". For example: +------------------------------------+ |0| 0000001 | 24 | +------------------------------------+ | 1 | 16 | 1 | www.example.org | +------------------------------------+ This Header record indicates a REGISTERED header containing character content, the header ID = 1, the charset used is US-ASCII and the value is " www.example.org". The header is expressed with a total of 28 bytes. When the T bit IS set, the Header field is an EXTENSION Header, the structure of which is: +------------------------------------+ |0| Flags (7) | Length (24) | +------------------------------------+ | Length of name (int32) | +------------------------------------+ | Name (string) | +------------------------------------+ | Length of value (int32) | +------------------------------------+ | Value | +------------------------------------+ For example.. an extension header that contains raw binary data... +------------------------------------+ |0| 0000000 | Length (24) | +------------------------------------+ | 5 | +------------------------------------+ | x-foo | +------------------------------------+ | 4 | +------------------------------------+ | {raw bytes} | +------------------------------------+ The header is expressed with a total of 21 bytes. The same flags apply. 0x1 indicates that the value is character data. If 0x1 is not set, the value contains raw octets. The key difference is that there is a 32-bit name length and variable length name field in place of the 32-bit ID field in the REGISTERED header. All other details remain the same. As is currently the case in SPDY, if a single header value contains multiple values, each can be separated using a single NUL (0) byte. There are several advantages to this approach: 1. Commonly used header names are omitted in favor of registered, known numeric IDs, saving space and making it more efficient to scan over commonly used headers. For instance, intermediaries that route requests based on common headers such as Host etc could choose to ignore EXTENSION header fields entirely, and scan only for the ID's of the fields they are interested in, rather than having to parse the entire bag of header names. 2. Header values can be expressed as raw octets or character data. Currently, mechanisms within HTTP require developers to muck around with Base64 encoding or other encodings when including detail within a header. This approach would eliminate that extra step. For instance, if I wanted to have a Content-Integrity header whose value is an hmac digest, I would be able to drop the raw bytes of the digest into the header value rather than base64 or hex encoding it into an ASCII string, saving CPU cycles and reducing the amount of data that must be transmitted. 3. Header values that contain character data would not be limited to US-ASCII. Multiple charset encodings would be allowed... obviously this has a whole slew of issues associated with it that need to be carefully considered. The charset encoding flag could be dropped, if necessary, from this proposal. For HTTP/1.1 Compatibility, each REGISTERED Header would be mapped to a known, registered HTTP/1.1 header, allowing one to one translation from the optimized form to the HTTP/1.1 form. Binary values would be base64-encoded. If a particular header does not allow for Base64 encoded values under HTTP/1.1, the down-level recipient would have the option of responding with an appropriate 404 response. That's it for now. There are additional considerations to be given to the specific selection of header fields to include within the SYN_STREAM vs. follow-on HEADERS frames but that's a separate conversation. As always, feedback is welcome... - James
Received on Friday, 13 July 2012 22:17:24 UTC