- From: Adam Barth <w3c@adambarth.com>
- Date: Fri, 5 Jun 2009 12:55:56 -0700
- To: Bjoern Hoehrmann <derhoermi@gmx.net>
- Cc: HTTP Working Group <ietf-http-wg@w3.org>
On Fri, Jun 5, 2009 at 9:14 AM, Bjoern Hoehrmann <derhoermi@gmx.net> wrote: > I've already mentioned the encoding extraction algorithm, but to add > some others: in draft-abarth-mime-sniff-01 section 3 step 3's special > handling of very particular sequences, For better or for worse, this is the way browsers work. I believe this is related to historical behaviors of Apache and other HTTP servers. Removing these rules cause binary spew to fill up the content area. > the handling of unregistered and malformed values in step 5, HTTP responses commonly contain these values and depend on user agents sniffing the actual media type. Removing these values causes compatibility problems. > the special handling of XML types in step 6, I believe this is to avoid step 7 applying to SVG images. > the relevance of the implementation supporting particular types > in step 7. Would you prefer this step applied to all media types that begin with "image/"? That might let us remove step 6 as well. > In section 4 why implementations may decide to pick any number of bytes > between 0 and 512, This is to avoid breaking sites that use comet <http://en.wikipedia.org/wiki/Comet_(programming)>. > why step 3 only applies when you have at least three > bytes and then only compares two bytes, We could change this to be slightly tighter, but it's a bit pedantic. > why the UTF-32 BOM is not being detected, We measured and determined that it was not needed for compatibility. In general, we tried to minimize the amount of sniffing. > why step four has those bytes and not others; That's just how browsers work. This is a point at which there is broad interoperability already. The costs of convincing implemenations to change this table outweigh the benefits. > in section 6 the special handling of image/svg+xml; This is to above mistakenly changing the type of an image/svg+xml resource that happens to begin with a magic number, e.g., BM. Again, this is a place where we are able to minimize the amount of sniffing. > in section 7 why the UTF-16 BOM is ignored. The algorithm doesn't work for UTF-16 anyway. What would be the point of skipping over the UTF-16 BOM? Again, this is a place we've minimized the amount of sniffing. > I see no justification for having a special algorithm for the charset > parameter; I've added a TODO to investigate whether this algorithm is still needed. Adam
Received on Friday, 5 June 2009 19:56:54 UTC