[whatwg] Content type sniffing

I just noticed that section 2.7.1 of HTML5 says:

   Extensions must not be used for determining resource types
   for resources fetched over HTTP.

While I understand the reasons for this, there are certainly cases where 
this will break sites (basically those using HTTP 0.9, or later HTTP 
versions but not sending a content-type).  In particular, the HTML 
sniffing in the algorithm is very limited and wouldn't sniff this document:

   <body>Some text</body>

as HTML.

Now this use case (no content-type at all) was pretty common when the 
unknown type sniffer in Gecko was written, but that was years ago.  Do 
we have any data on how common it is now?

-Boris

P.S.  Of course at the moment the sniffer in Gecko is used for more than 
just HTTP, and it looks like we'll need separate modes for things like 
HTTP and things like file://.  I can live with that, though.  For the 
file:// case detection of HTML in documents with no 
doctype/<html>/<head> is a must.

Received on Sunday, 11 January 2009 18:41:58 UTC