- From: Adam Barth <w3c@adambarth.com>
- Date: Wed, 1 Apr 2009 01:18:01 -0700
- To: "Roy T. Fielding" <fielding@gbiv.com>
- Cc: HTTP Working Group <ietf-http-wg@w3.org>
On Tue, Mar 31, 2009 at 7:44 PM, Roy T. Fielding <fielding@gbiv.com> wrote: > It is impossible to determine a media-type (how a recipient should > process a given representation) by sniffing the content (the data format). Regardless, many popular user agents override the server-provided MIME type after examining the content of HTTP responses. > When a media type > is not present (or is detectably incorrect), only the implementation > doing the processing can determine an appropriate guess because that > guess is almost always determined by the context in which the > reference was made (not by the content). Content sniffing algorithms in browsers largely ignore the context in which the HTTP response is being used. Specifically, algorithm for computing the effective MIME type is a function of the HTTP response alone (both in practice and in draft-abarth-mime-sniff). > Since the context is > deliberately not sent on the wire, there is absolutely no way that > accurate sniffing can be defined by HTTP. Thankfully, we do not require "accurate" sniffing. We simply require an algorithm for determining a MIME type that is compatible with existing Web content. > We aren't talking about > a protocol decision regarding communication; we are talking about > an operating default that is specific to the purpose of a given > client and will likely be different for each one. I disagree that we're interested in something specific to a given client. We're interested in an algorithm for determining the MIME type of an HTTP response that works with existing Web content. For example, suppose you're implementing an image editing program, let's call it Imageshop. You'd like users of Imageshop to be able to open images specified by URL. A user asks to edit http://example.com/fancy-image. Imageshop issues an HTTP request for that URL and receive the following response: Content-Type: image/jpeg GIF89a... Because popular user agents have historically interpreted such HTTP responses as image/gif, it is quite likely that the server intends this response to be treated as image/gif (and not as image/jpeg). If the Imageshop developers follow the existing HTTP spec, they will receive complaints from their users and be forced to reverse engineer a content sniffing algorithm that is compatible with existing Web content. If, instead, we specify a sniffing algorithm, Imageshop will interoperate with existing Web content as it's users expect. > In any case, there is no algorithm for sniffing that is anywhere > near the same level of standardization as HTTP. You're right that a number of popular user agents use different sniffing algorithms. I'm hoping to converge these implementations on a single sniffing algorithm. Having the HTTP spec recommend a specific algorithm will aid this process. > The one that HTML5 is working on would barely qualify as Experimental. I'm not sure what qualifies as "experimental," but the algorithm in draft-abarth-mime-sniff is quite similar to the algorithms that ship in Firefox and Chrome. We have a great deal of data from the Google search index and from opt-in user metrics with which to evaluate its compatibility with existing Web content. > If the folks > promoting such software can successfully deploy it across all HTTP > clients, then it should be referenced. Surely this is too high a bar. Why bother writing standards if we require all implementations to interoperate perfectly before putting pen to paper? > Until then, it remains an > unproven and, IMO, mistaken idea which is far more likely to > be overcome by events than become a standard way to handle HTTP. I don't think that content sniffing will magically disappear if we just ignore it long enough. Instead, we should shed some light on this dark corner of reality. Adam
Received on Wednesday, 1 April 2009 08:18:51 UTC