- From: Adam Barth <whatwg@adambarth.com>
- Date: Sun, 11 Jan 2009 23:54:18 -0800
On Sun, Jan 11, 2009 at 6:41 PM, Boris Zbarsky <bzbarsky at mit.edu> wrote: > I just noticed that section 2.7.1 of HTML5 says: > > Extensions must not be used for determining resource types > for resources fetched over HTTP. Extensions are bad news for content sniffing because they can often be chosen by the attacker. For example, suppose user-uploaded content is can be downloaded at: http://example.com/download.php In most PHP configurations, the attacker can choose whatever file extension he likes by directing the user's browser to: http://example.com/download.php/whatever.foo And the PHP script will happily run. > Now this use case (no content-type at all) was pretty common when the > unknown type sniffer in Gecko was written, but that was years ago. Do we > have any data on how common it is now? Yes. We do have lots of data from opt-in user metrics from Chrome. Here is a somewhat recent summary: https://crypto.stanford.edu/~abarth/research/html5/content-sniffing/ To address your particular concern, <body occurs 6899 times less often than <script on Web content that lacks a Content-Type (or has an bogus Content-Type like */*), assuming I did my arithmetic correctly. > P.S. Of course at the moment the sniffer in Gecko is used for more than > just HTTP, and it looks like we'll need separate modes for things like HTTP > and things like file://. I can live with that, though. For the file:// > case detection of HTML in documents with no doctype/<html>/<head> is a must. I'm sympathetic to adding more HTML tags to the list, but I'm not sure how far down the tail we should go. In Chrome, we went for 99.999% compatibility, which might be a bit far down the tail. You can see the algorithm here: http://src.chromium.org/viewvc/chrome/trunk/src/net/base/mime_sniffer.cc?view=markup Using that figure, we went down to <p (which is two tags less common than <body). Adam
Received on Sunday, 11 January 2009 23:54:18 UTC