- From: Julian Reschke <julian.reschke@gmx.de>
- Date: Thu, 07 Feb 2008 13:52:17 +0100
- To: "public-html@w3.org" <public-html@w3.org>
Hi, two weeks ago I got the task (<http://www.w3.org/html/wg/tracker/actions/44>) to collect feedback from HTTP WG with respect to the content sniffing specified in HTML5 in general, and the test cases at <http://www.hixie.ch/tests/adhoc/http/content-type/sniffing/> specifically. The discussion thread is archived at <http://lists.w3.org/Archives/Public/ietf-http-wg/2008JanMar/thread.html#msg120>. There was also some discussion over here, which I have tried to include. Below is my attempt to summarize what has been said: Related to the test cases themselves: 1) Content-Encoding vs sniffing The tests at <http://www.hixie.ch/tests/adhoc/http/content-type/sniffing/> are somewhat broken; case 8 through 10 are supposed to trigger content sniffing (as per HTML5, <http://www.w3.org/TR/2008/WD-html5-20080122/#content-type-sniffing>), but don't, as the server sends the response with Content-Encoding: gzip (see <http://lists.w3.org/Archives/Public/public-html/2008Jan/0235.html>). FF2 and FF3 beta currently do not implement sniffing in this case, matching what the spec says. Others apparently do. The fact that FF does not could be taken as an argument that it's not needed to "not break existing content". 2) Character sets vs sniffing The spec currently requires sniffing for "text/plain; charset=iso-8859-1" and "text/plain; charset=ISO-8859-1", assuming that those servers that do send an incorrect default content type always send it with a very specific character set name. It appears that some servers sometimes ship with other defaults, thus more character sets would need to be considered (<http://lists.w3.org/Archives/Public/public-html/2008Jan/0239.html>). Where do you draw the line? 3) "illegal characters" Some test cases, such as 16, claim the contents contains "invalid text/plain characters". At least case 16 doesn't. (<http://lists.w3.org/Archives/Public/ietf-http-wg/2008JanMar/0122.html>) 4) other type of sniffing HTML5 defines other types of sniffing (such as unknown -> PDF) that aren't covered by these tests, and haven't been discussed within this thread. Related to the topic of content sniffing in general: 5) content-type default It seems in general Apache httpd is blamed for having caused the original problem (content being served with wrong default content-type instead of no content-type at all). In the meantime, httpd supports a default type of "none" (<http://lists.w3.org/Archives/Public/public-html/2008Jan/0258.html>), so at least the right steps have been made to get rid of the problem in the future. 6) conflict with Webarch and TAG finding The current text in HTML5 contradicts WebArch (<http://www.w3.org/TR/webarch/#error-handling>) and the TAG finding "mime respect", in particular "avoid silent recovery" (<http://www.w3.org/2001/tag/doc/mime-respect.html#silent-recovery>). There seems to be broad agreement that it's good to document what widely deployed user agents actually do with respect to content sniffing. However, there was *no* agreement that it's HTML5's task to make that a "MUST" level requirement (<http://lists.w3.org/Archives/Public/public-html/2008Jan/0214.html>). Also, if it's still the goal to reduce the amount of content where content sniffing takes place, then it would be useful to make it easier for an author to actually find out that content sniffing took place. Thus, user agents that do content sniffing SHOULD offer a way to (1) turn if off and/or (2) notify the user when the UA decided to override the specified content type (<http://lists.w3.org/Archives/Public/public-html/2008Jan/0260.html>). It turns out that IE7 actually does offer (2) (see <http://blogs.msdn.com/ie/archive/2005/02/01/364581.aspx#364853>), and yes, it's also available through the UI. BR, Julian
Received on Thursday, 7 February 2008 12:52:47 UTC