NEW ISSUE: content sniffing from Mark Nottingham on 2009-03-30 (ietf-http-wg@w3.org from January to March 2009)

From: Mark Nottingham <mnot@mnot.net>
Date: Tue, 31 Mar 2009 10:58:38 +1100
To: HTTP Working Group <ietf-http-wg@w3.org>
Message-Id: <8BA2E12F-D65E-443D-BB40-0BE4214FEA76@mnot.net>

[ As discussed in SF ]

Browsers now routinely 'sniff' content to determine its type,  
sometimes in conflict with the media type conveyed in the Content-Type  
header. The reasons most often cited for this practice are the  
misconfiguration of servers, the inability of content authors to  
configure servers (either for technical reasons or lack of education),  
and generally the incentives placed upon browser vendors to "just work."

p3-payload section 3.2.1. currently says:
> If and only if the media type is not given by a Content-Type field,  
> the recipient MAY attempt to guess the media type via inspection of  
> its content and/or the name extension(s) of the URI used to identify  
> the resource. If the media type remains unknown, the recipient  
> SHOULD treat it as type "application/octet-stream".
which effectively disallows this practice, despite widespread use that  
(apparently) isn't stopping any time soon.

The language should be updated to reflect this reality, without unduly  
encouraging the use of sniffing except where necessary. Ideally, it  
will be done in such a way that:

   * Does not require sniffing for all uses of HTTP (i.e., a  
particular implementation and/or user can "opt in" to the use of a  
sniffing algorithm), since this is most commonly a problem for the  
browser case, and
   * Specifically allows a user and/or content provider to opt out of  
the use of sniffing in a particular interaction, and
   * Promotes interoperability (i.e., if two implementations sniff,  
they will do so in the same way).

See:
   http://tools.ietf.org/html/draft-abarth-mime-sniff
for a proposal sourced from HTML5.

Besides the issue of sniffing itself, there's also an open question of  
whether the sniffing algorithm would remain in a separate document  
(i.e., our work would only be to relax requirements to allow it), or  
whether it would be in-document.

In either case, we'd have to take a serious look at security  
considerations, and also look at impact on intermediaries, etc.

Note that this is not about sniffing encoding directly (see issue  
#20), although the resolution may be related.


--
Mark Nottingham     http://www.mnot.net/

Received on Monday, 30 March 2009 23:59:21 UTC