- From: Mike Belshe <mike@belshe.com>
- Date: Tue, 17 Jul 2012 00:18:03 -0700
- To: James M Snell <jasnell@gmail.com>
- Cc: ietf-http-wg@w3.org
- Message-ID: <CABaLYCswjWjNsBmFgJOWSCUZ+Nxu90NyRc+H_CRym9L7QX7ysg@mail.gmail.com>
i like the direction of this. a good blend of a registry and extension headers. i wasn't quite sure how you got to the 6 byte compressed header, tho. mike On Mon, Jul 16, 2012 at 11:51 PM, James M Snell <jasnell@gmail.com> wrote: > Ok... spent some time this evening playing around with the header frame > syntax a bit more to see what further optimizations could be made and to > see if the binary encoded header id's made any noticeable difference in > size and ease of processing. > > Here's the revised structure I played around with... > > 1. Within a HEADER block, I assume two possible types of headers, > REGISTERED and EXTENSION. A REGISTERED header is one that would be known to > the registrar and assigned a numeric id and a codepage. If the codepage is > 0, the implication is that the header is MUST UNDERSTAND and is considered > one of the core headers for the basic operation of the protocol. Codepages > 1-9 are MUST-IGNORE... that is, if a user-agent or server comes across a > header on these code pages that is not understood, the header can simply be > ignored. Codepages 10-14 are PRIVATE USE, with Codepage 10 being reserved > for MUST UNDERSTAND PRIVATE USE headers. EXTENSION headers are simple name > value pairs essentially as they exist today. > > 2. Within extension headers, the name portion MUST be ASCII and MUST NOT > be longer than 255 bytes (quite generous really). > > 3. Values may be binary or character based, as indicated by a flags field. > Values may be up to max(int32) in length. > > 4. Registered HTTP Methods can be identified by numeric value. Extension > Methods can be identified by character value. > > 5. The structure for REGISTERED HEADERS is... > > +------------------------------+ > |0| id (15-bit)| flags(8-bit) | > +------------------------------+ > | len (32-bit) | value | > +------------------------------+ > > 6. The structure for EXTENSION HEADERS is... > > +------------------------------+ > |1| flags(7-bit) | namelen (8) | > +------------------------------+ > | name | val len (32) | value | > +------------------------------+ > > Assuming the following registered headers... > > public static final short VERSION = 1; > public static final short METHOD = 2; > public static final short HOST = 3; > public static final short SESSION = 4; > public static final short CHARSET = 5; > public static final short REQUEST_URI = 6; > public static final short ACCEPT_LANG = 4097; > > And the following registered methods... > > public static final byte GET = 1; > public static final byte POST = 2; > public static final byte PUT = 3; > public static final byte DELETE = 4; > public static final byte PATCH = 5; > public static final byte HEAD = 6; > public static final byte OPTIONS = 7; > > Let's assume that what we want to to encode a HTTP GET for resource: > http://www.example.org/this/is/the/request?is=it¬=beautiful > > With a session identifier of "session_key", ACCEPT_LANG = en-US and > default charset encoding for all character based header values is > "US-ASCII"... Let's also add an extension header "ext" with value "foo"... > > The Version header can be encoded as: > {0, 1, 0, 0, 0, 0, 2, 2, 0} > > The GET Method header can be encoded as: > {0, 2, 0, 0, 0, 0, 1, 1} > > The Host header would be encoded as: > { 0, 3, 1, 0, 0, 0, 15, 119, 119, 119, > 46, 101, 120, 97, 109, 112, 108, 101, 46, 111, > 114, 103} > > The Accept-Lang header would be encoded as: > {16, 1, 1, 0, 0, 0, 5, 'e', 'n', '-', 'U', 'S'} > > The Extension header ext: foo would be encoded as: > {-128, 1, 3, 101, 120, 116, 0, 0, 0, 3, 102, 111, 111} > > The entire header block is encoded into a structure of 145 bytes in length; > > [8, 0, 1, 0, 0, 0, 0, 2, 2, 0, 0, 2, 0, 0, 0, 0, 1, 1, 0, 3, 1, 0, 0, 0, > 15, 119, 119, 119, 46, 101, 120, 97, 109, 112, 108, 101, 46, 111, 114, 103, > 0, 6, 1, 0, 0, 0, 40, 47, 116, 104, 105, 115, 47, 105, 115, 47, 116, 104, > 101, 47, 114, 101, 113, 117, 101, 115, 116, 63, 105, 115, 61, 105, 116, 38, > 110, 111, 116, 61, 98, 101, 97, 117, 116, 105, 102, 117, 108, 0, 5, 1, 0, > 0, 0, 8, 117, 115, 45, 97, 115, 99, 105, 105, 0, 4, 1, 0, 0, 0, 11, 115, > 101, 115, 115, 105, 111, 110, 95, 107, 101, 121, 16, 1, 1, 0, 0, 0, 5, 101, > 110, 45, 85, 83, -128, 1, 3, 101, 120, 116, 0, 0, 0, 3, 102, 111, 111] > > By comparison, the same structure encoded using the existing SPDY HEADER > block would require 208 bytes sans compression. > > After applying compression of the block using the SPDY dictionary, the > block compresses into 6 compact bytes. > > [120, 63, -29, -58, -89, -62] > Assuming this structure was used within a SPDY_STREAM message, > unencrypted, a proxy/router that is scanning the headers to determine where > to route the SYN_STREAM too would need only to look at the first two bytes > of each header to determine if the header is either the HOST, METHOD, > REQUEST_URI, VERSION or SESSION identifier. This scheme should prove to be > significantly faster to scan and perform operations on than the current > all-text-key-pair model. As always, tho, your mileage may vary. > > /end-experiment > > - James > > On Fri, Jul 13, 2012 at 3:16 PM, James M Snell <jasnell@gmail.com> wrote: > >> This note is intended to provide some additional thoughts for discussion >> around the design and use of SPDY as the possible basis for HTTP/2.0. The >> intent is to provide fuel for discussion... comments are definitely welcome. >> >> As discussed within draft-tarreau-httpbis-network-friendly-00, and as has >> been mentioned several times in discussion on list, handling of headers >> within the current SPDY framing, and in particular the layering of HTTP/1.1 >> messages into SPDY frames is less than optimal. There is significant wasted >> space, duplication, etc that -- strictly speaking -- really isn't >> necessary. While I recognize that the following increases the basic >> complexity of the protocol, it allows fairly significant optimization >> following the same basic lines of reasoning expressed in >> draft-tarreau-httpbis-network-friendly-00. >> >> Section 2.6.1 of the SPDY draft defines header blocks using the following >> format: >> >> +------------------------------------+ >> | Number of Name/Value pairs (int32) | >> +------------------------------------+ >> | Length of name (int32) | >> +------------------------------------+ >> | Name (string) | >> +------------------------------------+ >> | Length of value (int32) | >> +------------------------------------+ >> | Value (string) | >> +------------------------------------+ >> | (repeats) | >> >> This structure is used within SYN_STREAM and HEADERS frames. >> >> What I propose is the following revised structure: >> >> +------------------------------------+ >> | Number of Headers (int32) | >> +------------------------------------+ >> |T| Flags (7) | Length (24) | >> +------------------------------------+ >> | Data | >> +------------------------------------+ >> |T| Flags (7) | Length (24) | >> +------------------------------------- >> | Data | >> +------------------------------------- >> | (repeats) | >> >> T is a single bit identifying the Header Type. There are two types.. >> REGISTERED (0) and EXTENSION (1) >> >> Flags provides flags for the specific header field. The flag 0x1 >> indicates that the header value contains Character Data. If not set, the >> value is assumed to consist of raw octets. 0x2 indicates that the value is >> compressed. >> >> Length is an unsigned 24-bit value specifying the number of octets after >> the length field. >> >> When the T bit is NOT set, the Header field is a REGISTERED Header, the >> structure of which is: >> >> +------------------------------------+ >> |0| Flags (7) | Length (24) | >> +------------------------------------+ >> | ID | Value Length (int32) |Value...| >> +------------------------------------+ >> >> The ID is a 32-bit number uniquely identifying the registered field. Each >> is assigned by the registrar. For instance, the "Host" field could have a >> registered value of "1", the "Accept-Lang" field could have a registered >> value of "6", and so forth. >> >> The Value Length is a 32-bit value indicating the length of the value. >> >> If Flag 0x1 is set, the value is assumed to contain character data. When >> set, the value MUST be preceded by a single unsigned 8-bit integer >> identifying the character encoding utilized. The values are assigned by the >> registrar. For instance, US-ASCII could have a registered value of "1", >> while "UTF-8" could have a registered value of "2". >> >> For example: >> >> +------------------------------------+ >> |0| 0000001 | 24 | >> +------------------------------------+ >> | 1 | 16 | 1 | www.example.org | >> +------------------------------------+ >> >> This Header record indicates a REGISTERED header containing character >> content, the header ID = 1, the charset used is US-ASCII and the value is " >> www.example.org". The header is expressed with a total of 28 bytes. >> >> When the T bit IS set, the Header field is an EXTENSION Header, the >> structure of which is: >> >> +------------------------------------+ >> |0| Flags (7) | Length (24) | >> +------------------------------------+ >> | Length of name (int32) | >> +------------------------------------+ >> | Name (string) | >> +------------------------------------+ >> | Length of value (int32) | >> +------------------------------------+ >> | Value | >> +------------------------------------+ >> >> For example.. an extension header that contains raw binary data... >> >> +------------------------------------+ >> |0| 0000000 | Length (24) | >> +------------------------------------+ >> | 5 | >> +------------------------------------+ >> | x-foo | >> +------------------------------------+ >> | 4 | >> +------------------------------------+ >> | {raw bytes} | >> +------------------------------------+ >> >> The header is expressed with a total of 21 bytes. >> >> The same flags apply. 0x1 indicates that the value is character data. If >> 0x1 is not set, the value contains raw octets. The key difference is that >> there is a 32-bit name length and variable length name field in place of >> the 32-bit ID field in the REGISTERED header. All other details remain the >> same. >> >> As is currently the case in SPDY, if a single header value contains >> multiple values, each can be separated using a single NUL (0) byte. >> >> There are several advantages to this approach: >> >> 1. Commonly used header names are omitted in favor of registered, known >> numeric IDs, saving space and making it more efficient to scan over >> commonly used headers. For instance, intermediaries that route requests >> based on common headers such as Host etc could choose to ignore EXTENSION >> header fields entirely, and scan only for the ID's of the fields they are >> interested in, rather than having to parse the entire bag of header names. >> >> 2. Header values can be expressed as raw octets or character data. >> Currently, mechanisms within HTTP require developers to muck around with >> Base64 encoding or other encodings when including detail within a header. >> This approach would eliminate that extra step. For instance, if I wanted to >> have a Content-Integrity header whose value is an hmac digest, I would be >> able to drop the raw bytes of the digest into the header value rather than >> base64 or hex encoding it into an ASCII string, saving CPU cycles and >> reducing the amount of data that must be transmitted. >> >> 3. Header values that contain character data would not be limited to >> US-ASCII. Multiple charset encodings would be allowed... obviously this has >> a whole slew of issues associated with it that need to be carefully >> considered. The charset encoding flag could be dropped, if necessary, from >> this proposal. >> >> For HTTP/1.1 Compatibility, each REGISTERED Header would be mapped to a >> known, registered HTTP/1.1 header, allowing one to one translation from the >> optimized form to the HTTP/1.1 form. Binary values would be base64-encoded. >> If a particular header does not allow for Base64 encoded values under >> HTTP/1.1, the down-level recipient would have the option of responding with >> an appropriate 404 response. >> >> That's it for now. There are additional considerations to be given to the >> specific selection of header fields to include within the SYN_STREAM vs. >> follow-on HEADERS frames but that's a separate conversation. As always, >> feedback is welcome... >> >> - James >> >> >
Received on Tuesday, 17 July 2012 07:18:43 UTC