This is a very good summary.
My own preference would be to move toward a world where content
sniffing is discouraged, rather than to evovle to one where all
bad behavior from the past is codified into future law.
Forwarded message 1
Hi,
two weeks ago I got the task
(<http://www.w3.org/html/wg/tracker/actions/44>) to collect feedback
from HTTP WG with respect to the content sniffing specified in HTML5 in
general, and the test cases at
<http://www.hixie.ch/tests/adhoc/http/content-type/sniffing/> specifically.
The discussion thread is archived at
<http://lists.w3.org/Archives/Public/ietf-http-wg/2008JanMar/thread.html#msg120>.
There was also some discussion over here, which I have tried to include.
Below is my attempt to summarize what has been said:
Related to the test cases themselves:
1) Content-Encoding vs sniffing
The tests at
<http://www.hixie.ch/tests/adhoc/http/content-type/sniffing/> are
somewhat broken; case 8 through 10 are supposed to trigger content
sniffing (as per HTML5,
<http://www.w3.org/TR/2008/WD-html5-20080122/#content-type-sniffing>),
but don't, as the server sends the response with Content-Encoding: gzip
(see <http://lists.w3.org/Archives/Public/public-html/2008Jan/0235.html>).
FF2 and FF3 beta currently do not implement sniffing in this case,
matching what the spec says. Others apparently do. The fact that FF does
not could be taken as an argument that it's not needed to "not break
existing content".
2) Character sets vs sniffing
The spec currently requires sniffing for "text/plain;
charset=iso-8859-1" and "text/plain; charset=ISO-8859-1", assuming that
those servers that do send an incorrect default content type always send
it with a very specific character set name. It appears that some servers
sometimes ship with other defaults, thus more character sets would need
to be considered
(<http://lists.w3.org/Archives/Public/public-html/2008Jan/0239.html>).
Where do you draw the line?
3) "illegal characters"
Some test cases, such as 16, claim the contents contains "invalid
text/plain characters". At least case 16 doesn't.
(<http://lists.w3.org/Archives/Public/ietf-http-wg/2008JanMar/0122.html>)
4) other type of sniffing
HTML5 defines other types of sniffing (such as unknown -> PDF) that
aren't covered by these tests, and haven't been discussed within this
thread.
Related to the topic of content sniffing in general:
5) content-type default
It seems in general Apache httpd is blamed for having caused the
original problem (content being served with wrong default content-type
instead of no content-type at all). In the meantime, httpd supports a
default type of "none"
(<http://lists.w3.org/Archives/Public/public-html/2008Jan/0258.html>),
so at least the right steps have been made to get rid of the problem in
the future.
6) conflict with Webarch and TAG finding
The current text in HTML5 contradicts WebArch
(<http://www.w3.org/TR/webarch/#error-handling>) and the TAG finding
"mime respect", in particular "avoid silent recovery"
(<http://www.w3.org/2001/tag/doc/mime-respect.html#silent-recovery>).
There seems to be broad agreement that it's good to document what widely
deployed user agents actually do with respect to content sniffing.
However, there was *no* agreement that it's HTML5's task to make that a
"MUST" level requirement
(<http://lists.w3.org/Archives/Public/public-html/2008Jan/0214.html>).
Also, if it's still the goal to reduce the amount of content where
content sniffing takes place, then it would be useful to make it easier
for an author to actually find out that content sniffing took place.
Thus, user agents that do content sniffing SHOULD offer a way to (1)
turn if off and/or (2) notify the user when the UA decided to override
the specified content type
(<http://lists.w3.org/Archives/Public/public-html/2008Jan/0260.html>).
It turns out that IE7 actually does offer (2) (see
<http://blogs.msdn.com/ie/archive/2005/02/01/364581.aspx#364853>), and
yes, it's also available through the UI.
BR, Julian