- From: Peter Occil <poccil14@gmail.com>
- Date: Sat, 25 May 2013 12:46:49 -0400
- To: "Gordon P. Hemsley" <gphemsley@gmail.com>
- Cc: WHATWG <whatwg@whatwg.org>
Sorry for not including proper discussion. These are the differences in my algorithm from the existing one. One weakness of the existing algorithm is that its terminology can be rather technical ("sequence[s]", "while [this happens], execute the following steps: [one step only]", "Loop M"). On the other hand, my algorithm is better seen as a logical set of steps that are intended to be easy to follow; actual implementations may differ as long as they produce the same results. (This is the same approach used in the Unicode Standard.) Accordingly, there are fewer loops and fewer "if-structures", making the algorithm easier to understand and follow. My algorithm is also stricter in many aspects than the existing one, as explained further below. My algorithm skips only SPACE and TAB instead of all whitespace characters because it assumes that the field value was already extracted from Content-Type according to the HTTP/HTTPbis spec (0x0C, form feed, is never considered whitespace in HTTP headers). In particular, it assumes that folding whitespace (obs-fold) was replaced with spaces (or the message with obs-fold rejected) before the Content-Type value was interpreted. Type, subtype, and parameter names are converted to lowercase. Type, subtype, and parameter names are checked according to the rules found in RFC6838 section 4.2, rather than RFC2045 section 1; the former is what I believe is the latest syntax of those names, while the latter is an older syntax. Parameter values are checked according to the rules found in HTTPbis part 1, section 3.2.6, in the latest version [1]. In particular, it rejects parameters with unclosed or otherwise invalid quoted strings, and checks the characters in unquoted parameter values. My algorithm treats Content-Type values with duplicate parameter names as an error (see RFC6838 section 4.3). ------------------ Also, there is a mistake: Two steps were reversed. They should say the following instead: 8. Convert parameter to ASCII lowercase. 9. If parameters contains a mapping for parameter, return undefined. --Peter [1]: https://svn.tools.ietf.org/svn/wg/httpbis/draft-ietf-httpbis/latest/p1-messaging.html -----Original Message----- From: Gordon P. Hemsley Sent: Saturday, May 25, 2013 11:55 AM To: Peter Occil Cc: WHATWG Subject: Re: [whatwg] [mimesniff] Complete MIME type parsing algorithm for section 5 Peter, The burden is on you to describe your proposals and what their purpose and benefit would be. How does this proposed algorithm differ from what is already in the spec? How is it better? Regards, Gordon On Sat, May 25, 2013 at 3:58 AM, Peter Occil <poccil14@gmail.com> wrote: > I present this draft of the complete algorithm for parsing a MIME type. I > would appreciate comments. > > --Peter > > ---------------------------------------------------- > > An ASCII alphanumeric is a byte or character in the ranges 0x41-0x5A, > 0x61-0x7A, and 0x30-0x39. > A MIME type byte is an ASCII alphanumeric or one of the following bytes: ! > # $ & ^ _ . + - > A parameter value byte is a MIME type byte or one of the following bytes: > % ' * ` | ~ > > To parse a MIME type, run the following steps: > > 1. Let length be the length of the byte sequence of the MIME type. > 2. If length is less than 1, return undefined. > 3. Let pointer be 0. Pointer is a zero-based index to the current byte in > the byte sequence. > 4. Advance pointer to the next byte other than 0x20 (SPACE) or 0x09 (TAB). > 5. Let type be the byte string from the current byte up to but not > including the next "/" byte. Advance pointer to the next "/" byte. > 6. If the current byte isn't "/", return undefined. > 7. Increment pointer by 1. > 8. Let subtype be the byte string from the current byte up to but not > including the next 0x20 (SPACE), 0x09 (TAB), or ";" byte. Advance pointer > to the next 0x20 (SPACE), 0x09 (TAB), or ";" byte. > 9. If type is empty, contains a byte that isn't a MIME type byte, or > doesn't > begin with an ASCII alphanumeric, or is longer than 127 bytes, return > undefined. > 10. If subtype is empty, contains a byte that isn't a MIME type byte, or > doesn't begin with an ASCII alphanumeric, or is longer than 127 bytes, > return undefined. > 11. Convert type and subtype to ASCII lowercase. > 12. Let parameters be an empty dictionary. > 13. Run the following substeps in a loop. > 1. Advance pointer to the next byte other than 0x20 (SPACE) or 0x09 > (TAB). > 2. If pointer is equal to length, return type, subtype, and > parameters. > 3. If the current byte isn't ";", return undefined. > 4. Increment pointer by 1. > 5. If pointer is equal to length, return type, subtype, and > parameters. > 6. Let parameter be the byte string from the current byte up to but > not including the next "=" byte. Advance pointer to the next "=" byte. > 7. If parameter is empty, contains a byte that isn't a MIME type > byte, or doesn't begin with an ASCII alphanumeric, or is longer than 127 > bytes, return undefined. > 8. If parameters contains a mapping for parameter, return undefined. > 9. Convert parameter to ASCII lowercase. > 10. If the current byte isn't "=", return undefined. > 11. Increment pointer by 1. > 12. If the current byte equals 0x22 (quotation mark), run the > following substeps: > 1. Let value be an empty byte string. > 2. Increment pointer by 1. > 3. Run these substeps in a loop. > 1. If pointer is equal to length, return type, > subtype, and parameters. > 2. If the current byte equals 0x7F or is less than > 0x20, and the current byte isn't TAB (0x09), return type, subtype, and > parameters. > 3. If the current byte equals 0x22 (quotation mark), > increment pointer by 1 and terminate this loop. > 4. Otherwise, if the current byte is "\", increment > pointer by 1. Then, if there is a current byte, append that byte to value. > 5. Otherwise, append the current byte to value. > 6. Increment pointer by 1. > 4. Add the mapping of parameter to value to the parameters > dictionary. > 13. Otherwise, run these substeps: > 1. Let value be the byte string from the current byte up to > but not including the next 0x20 (SPACE), 0x09 (TAB), or ";" byte. Advance > pointer to the next 0x20 (SPACE), 0x09 (TAB), or ";" byte. > 2. If value is empty or contains a byte that isn't a > parameter value byte, return undefined. > 3. Add the mapping of parameter to value to the parameters > dictionary. > > ------------------- > > -- Gordon P. Hemsley me@gphemsley.org http://gphemsley.org/ • http://gphemsley.org/blog/
Received on Saturday, 25 May 2013 16:47:25 UTC