- From: Gordon P. Hemsley <gphemsley@gmail.com>
- Date: Fri, 31 May 2013 13:30:56 -0400
- To: whatwg List <whatwg@whatwg.org>
Hello all, This is a request seeking feedback and review on the MIME Sniffing algorithm to "parse a MIME type": http://mimesniff.spec.whatwg.org/#parse-a-mime-type After numerous iterations, I think it is in a state that accurately reflects the best current practices for interoperability. As is common with such things, there are numerous points in this algorithm where implementations do not agree. In general, Firefox and Chrome tend to pattern together, as do IE and Opera. Safari often patterns on its own, in favor of a more literal interpretation of the various RFCs on the matter. At times, I have had to make a decision as to which was the best approach. This usually results in half of the implementations being in violation of the spec; I hope, in those instances, the implementations in question can be updated to become interoperable with the rest. With that being said, there are two specific points I want to raise: (1) The more recent RFCs on the matter restrict type, subtype, and parameter names to 127 characters. No implementation actually enforces this limit, but I have included it in the algorithm (relevant points appear in red) because I think it would be better and safer for both the user and the user agent to do so. (2) Based on my analysis of existing implementations, anything that occurs between the semicolon (and any first whitespace) and the equals sign is treated as the parameter name, including any whitespace before the equals sign. However, in order to test parameters, I have been using 'charset' (because that's they only one I'm aware of that has a Web-visible effect), and certain implementations may be sniffing specifically for the string "charset=", which would cloud the results of my testing. Any enlightenment into this issue would be much appreciated. I also have a few general points: * You may notice in the algorithm that I am using hybrid terminology, sometimes talking about bytes and sometimes talking about characters. This is mostly because I haven't decided/determined whether to treat a MIME type as ASCII or as UTF-8. I think there are arguments on both sides of the issue, but I'm eager to hear your opinions and advice (especially about how I might phrase the algorithm if it were written in terms of characters instead of bytes). * One of the most controversial parts of this algorithm might be the issue of what to do when a parameter appears more than once. (The RFCs suggest that the MIME type should be treated as invalid in such a case, but no implementation actually treats it that way.) I have opted to make a later appearance of a parameter override and replace an earlier appearance of a parameter. Modulo caveat (2) above, this is only done in half the implementations; in particular, IE and Opera appear to use the first instance of the parameter as the canonical value. * Another important point to notice is the fact that this algorithm allows parameter names to appear without values. This is useful in situations such as the "base64" option in data: URLs that use the mere presence or absence of a parameter to set its boolean value. Note, however, that a parameter that has been given an explicit value (even if that value is the empty string) does not get overridden by the later appearance of a boolean parameter of the same name. I think those are the important points of background information you need to know in order to evaluate this algorithm. I look forward to your response. Regards, Gordon -- Gordon P. Hemsley me@gphemsley.org http://gphemsley.org/ • http://gphemsley.org/blog/
Received on Friday, 31 May 2013 17:31:43 UTC