Re: review of content type rules by IETF/HTTP community

On Sun, 2007-08-19 at 18:23 -0500, Robert Burns wrote:
> On Aug 17, 2007, at 3:45 PM, Dan Connolly wrote:
> > According to a straightforward architecture for content types in the
> > Web[META], the HTTP specification should suffice and the HTML 5
> > specification need not specify another algorithm. But that  
> > architecture
> > assumes that Web publishers (server adminstrators and content
> > developers) reliably label content. Observing that labelling by Web
> > publishers is widely unreliable, and software that works around these
> > problems is widespread
> 
> BTW, do we have data on this?

Indeed, early review feedback[Fielding07] suggests I
should cite/excerpt more of the background that I had in mind
before sending this out for wider review.

[Fielding07]
http://lists.w3.org/Archives/Public/www-tag/2007Aug/0034.html

In particular, I had in mind these two messages:

[[
We (Microsoft) want to and plan to continue to bring our implementation in ever-higher compliance
with the standards.  We can't change our behavior for content that exists today, though, so we 
will have to have content developers opt in.
]]
 -- Chris Wilson Fri, 6 Apr 2007 15:38:19 -0700
 http://lists.w3.org/Archives/Public/public-html/2007Apr/0328.html

and...

[[
I've been pushing for Web browser vendors to fix this for approximately 
eight years, roughly half the lifetime of HTTP so far.

I've worked hard in the Mozilla community, at Netscape, at Opera, in the 
Webkit community, in talks with Microsoft, at W3C meetings, in 
specifications, in writing test cases, and in writing documentation, over 
those eight years, trying to get this issue fixed.

When Microsoft asked me for my list of top ten bugs that I'd like fixed in 
IE7, I listed just one: HTTP content sniffing. I even included nearly a 
hundred tests to help them do this.

Over the years I have tried to get browsers to stop assuming that anything 
at the end of an <img src=""> was an image, and tried to get them to use 
the MIME type sent with the image instead of content-sniffing to determine 
the image type.

I have tried to get browsers to use the Content-Type header when following 
hyperlinks, to stop them automatically downloading and showing videos that 
are marked as text/plain.

I have tried to get browsers to obey the Content-Type headers when 
downloading files that <script> elements point to, even going as far as to 
pointing out the security implications of allowing authors to download any 
random file that happens to be in JS-like format (e.g. any JSON data), and 
reading it a if it was on their domain.

I have tried to get browsers to obey the Content-Type of files when <link> 
elements are used to point to stylesheets.

I have tried to get browsers to obey the Content-Type headers for when 
they handle <object> elements, going as far as including tests for this in 
the Acid2 test.

I have tried to get browsers to rely on MIME types for detecting RSS and 
Atom feeds, instead of sniffing every HTML page before displaying it.


Here is the sum total of what all the above, and all the other advocacy 
that I have done over the eight years I've been working on this, has done:

1. Mozilla, in standards mode, ignores CSS files that don't have 
Content-Type sent to 'text/css'. This took many years, and I've had to 
fight to keep this in several times. It only affects a small minority of 
sites that use standards mode, but even those sites sometimes fail to 
render correctly in Mozilla because of this, while rendering fine in other 
browsers. We get bugs filed on this regularly.

2. Mozilla and Opera have limited their sniffing of content sent as 
text/plain so that instead of sniffing on all text/plain content, they 
only sniff on the majority of text/plain content.

3. That's all. Only two minor things.


This isn't because of lazyness. This is because ANY BROWSER THAT ACTUALLY 
TRIES TO IMPLEMENT THESE THINGS WOULD LOSE ALL MARKET SHARE. You simply 
cannot make a vendor do something that will make them lose marketshare. It 
won't work. Even vendors that have the best of intentions will immediately 
revert to "buggy" behaviour when implementing the "correct" behaviour 
causes thousands of customer support calls asking why their new browser 
broke the Web. 
]]
 --  Ian Hickson <ian@hixie.ch>
  Wed, 9 Aug 2006 01:14:06 +0000 (UTC)
  Re: Approved TAG finding: Authoritative Metadata
  http://lists.w3.org/Archives/Public/www-tag/2006Aug/0027.html


Reviewing that thread, I see that [Fielding07] is consistent
with his position at the time:

[[
No, that's total speculation.  None of them have even tried to implement
a configuration option for identifying incorrect content types as an  
error,
let alone deployed it in the market.  They all just whine about  
priorities
and how hard it is to change the world.  I've deployed far more drastic
changes than that on the Web.  It has nothing to do with the market or
the deployed technology: all it requires is a willingness to do the
right thing instead of always copying other people's mistakes.
]]
 -- http://lists.w3.org/Archives/Public/www-tag/2006Aug/0029.html


>  Do these content-type headers suffer  
> from mislabeling on a widespread basis?

I find the claim that "millions of videos are served as text/plain"
credible, as well as claims about mislabelling of RSS feeds,
CSS stylesheets, and JavaScript files.

>  It seems to me the more we  
> give up on this the more we'll lose this feature.

True.

Fielding's position is appealing, but until Microsoft
is persuaded to change our behavior for content that exists today,
I don't see any way to save it.

>  We'd have to  
> institute a new header, "really-the-content-type", and make that  
> authoritative.

Indeed, did you see Sam Ruby's suggestion?

[[
While I don't expect this to be fixed, I would like to request that 
there be some parameter (like, "application/xml; damnit") which 
indicates that I think I know what I'm doing and would appreciate being 
treated like an adult.
]]

Meanwhile, Hickson argued that we have little reason to believe
new mechanisms would not suffer the same market dynamics as present ones.
http://lists.w3.org/Archives/Public/www-tag/2006Aug/0027.html

-- 
Dan Connolly, W3C http://www.w3.org/People/Connolly/

Received on Monday, 20 August 2007 14:12:22 UTC