Re: Approved TAG finding: Authoritative Metadata from Ian Hickson on 2006-08-09 (www-tag@w3.org from August 2006)

From: Ian Hickson <ian@hixie.ch>
Date: Wed, 9 Aug 2006 01:14:06 +0000 (UTC)
To: noah_mendelsohn@us.ibm.com
Cc: Anne van Kesteren <annevk@opera.com>, www-tag@w3.org
Message-ID: <Pine.LNX.4.62.0608090045250.5340@dhalsim.dreamhost.com>
On Tue, 8 Aug 2006 noah_mendelsohn@us.ibm.com wrote:
>
> This thread has referenced the very interesting blog entry at [1], and 
> makes the case that the TAG is off base in pushing the Web community [2] 
> quite strongly to give precedence to the HTTP Content-Type header over 
> content sniffing, keying on the URI suffix, etc. [...]
> 
> [1] http://ln.hixie.ch/?start=1154950069&count=1
> [2] http://www.w3.org/2001/tag/doc/mime-respect-20060412

To give some context for my blog post:

I've been pushing for Web browser vendors to fix this for approximately 
eight years, roughly half the lifetime of HTTP so far.

I've worked hard in the Mozilla community, at Netscape, at Opera, in the 
Webkit community, in talks with Microsoft, at W3C meetings, in 
specifications, in writing test cases, and in writing documentation, over 
those eight years, trying to get this issue fixed.

When Microsoft asked me for my list of top ten bugs that I'd like fixed in 
IE7, I listed just one: HTTP content sniffing. I even included nearly a 
hundred tests to help them do this.

Over the years I have tried to get browsers to stop assuming that anything 
at the end of an <img src=""> was an image, and tried to get them to use 
the MIME type sent with the image instead of content-sniffing to determine 
the image type.

I have tried to get browsers to use the Content-Type header when following 
hyperlinks, to stop them automatically downloading and showing videos that 
are marked as text/plain.

I have tried to get browsers to obey the Content-Type headers when 
downloading files that <script> elements point to, even going as far as to 
pointing out the security implications of allowing authors to download any 
random file that happens to be in JS-like format (e.g. any JSON data), and 
reading it a if it was on their domain.

I have tried to get browsers to obey the Content-Type of files when <link> 
elements are used to point to stylesheets.

I have tried to get browsers to obey the Content-Type headers for when 
they handle <object> elements, going as far as including tests for this in 
the Acid2 test.

I have tried to get browsers to rely on MIME types for detecting RSS and 
Atom feeds, instead of sniffing every HTML page before displaying it.


Here is the sum total of what all the above, and all the other advocacy 
that I have done over the eight years I've been working on this, has done:

1. Mozilla, in standards mode, ignores CSS files that don't have 
Content-Type sent to 'text/css'. This took many years, and I've had to 
fight to keep this in several times. It only affects a small minority of 
sites that use standards mode, but even those sites sometimes fail to 
render correctly in Mozilla because of this, while rendering fine in other 
browsers. We get bugs filed on this regularly.

2. Mozilla and Opera have limited their sniffing of content sent as 
text/plain so that instead of sniffing on all text/plain content, they 
only sniff on the majority of text/plain content.

3. That's all. Only two minor things.


This isn't because of lazyness. This is because ANY BROWSER THAT ACTUALLY 
TRIES TO IMPLEMENT THESE THINGS WOULD LOSE ALL MARKET SHARE. You simply 
cannot make a vendor do something that will make them lose marketshare. It 
won't work. Even vendors that have the best of intentions will immediately 
revert to "buggy" behaviour when implementing the "correct" behaviour 
causes thousands of customer support calls asking why their new browser 
broke the Web.


>> I think it may be time to retire the Content-Type header, putting to 
>> sleep the myth that it is in any way authoritative, and instead have 
>> well-defined content-sniffing rules for Web content.
>
> I'm afraid I just don't get that.  I would think the right answer would 
> be:  let's not perpetuate these mistakes as new types spring up on the 
> Web.  Let's work hard to get them sourced with proper media types, so 
> that we can have a pretty clean Web that scales well, albeit with a few 
> historical warts, rather than a free for all in which there's no 
> reliable way to establish a new type, or to reliably signal its use from 
> the server.  So, the normative rule is:  use Content-Type.  The 
> accomodation is:  cheat where already deployed content requires you to.

I've done the "work hard" part. In fact, I think I might have done more 
work to get browsers to obey content types than anyone else on the planet. 
I've done the work when it comes to new types (e.g. RSS, Atom); I've done 
the work when it comes to old types (e.g. HTML, text/plain); I've done it 
for many vendors both internally and externally; I have some 97 or so test 
cases publically available for anyone to use to test their behaviour.

There has to come a point where we realise that it doesn't work.


My intent is to make the HTML5 spec define how browsers should content 
sniff, and when they should do so, so that we can get interoperable and 
reliable content sniffing in well-defined cases. The Content-Type header 
is still useful for certain things, e.g. specifying the encoding of 
text/plain content, or making the difference between text/plain, 
text/html, and text/xml resources, and therefore won't be completely 
thrown out. It just wouldn't be the only variable any more; in the 
majority of cases, it would be largely ignored.

I believe this is a significantly more realistic way forward for the Web.

-- 
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'
Received on Wednesday, 9 August 2006 01:14:16 UTC