- From: Adam Barth <whatwg@adambarth.com>
- Date: Mon, 12 Jan 2009 00:02:31 -0800
I should say that these figures are weighted by the number of page loads, so if sniffing for a particular tag is needed for the digg.com home page, it will show up as a large number. If you don't weight by traffic, you get similar results, but with slightly different numbers. Adam On Sun, Jan 11, 2009 at 11:54 PM, Adam Barth <whatwg at adambarth.com> wrote: > On Sun, Jan 11, 2009 at 6:41 PM, Boris Zbarsky <bzbarsky at mit.edu> wrote: >> I just noticed that section 2.7.1 of HTML5 says: >> >> Extensions must not be used for determining resource types >> for resources fetched over HTTP. > > Extensions are bad news for content sniffing because they can often be > chosen by the attacker. For example, suppose user-uploaded content is > can be downloaded at: > > http://example.com/download.php > > In most PHP configurations, the attacker can choose whatever file > extension he likes by directing the user's browser to: > > http://example.com/download.php/whatever.foo > > And the PHP script will happily run. > >> Now this use case (no content-type at all) was pretty common when the >> unknown type sniffer in Gecko was written, but that was years ago. Do we >> have any data on how common it is now? > > Yes. We do have lots of data from opt-in user metrics from Chrome. > Here is a somewhat recent summary: > > https://crypto.stanford.edu/~abarth/research/html5/content-sniffing/ > > To address your particular concern, <body occurs 6899 times less often > than <script on Web content that lacks a Content-Type (or has an bogus > Content-Type like */*), assuming I did my arithmetic correctly. > >> P.S. Of course at the moment the sniffer in Gecko is used for more than >> just HTTP, and it looks like we'll need separate modes for things like HTTP >> and things like file://. I can live with that, though. For the file:// >> case detection of HTML in documents with no doctype/<html>/<head> is a must. > > I'm sympathetic to adding more HTML tags to the list, but I'm not sure > how far down the tail we should go. In Chrome, we went for 99.999% > compatibility, which might be a bit far down the tail. You can see > the algorithm here: > > http://src.chromium.org/viewvc/chrome/trunk/src/net/base/mime_sniffer.cc?view=markup > > Using that figure, we went down to <p (which is two tags less common > than <body). > > Adam >
Received on Monday, 12 January 2009 00:02:31 UTC