Re: NEW ISSUE: content sniffing from Adam Barth on 2009-04-03 (ietf-http-wg@w3.org from April to June 2009)

From: Adam Barth <w3c@adambarth.com>
Date: Thu, 2 Apr 2009 21:51:11 -0700
To: "William A. Rowe, Jr." <wrowe@rowe-clan.net>
Cc: "Roy T. Fielding" <fielding@gbiv.com>, HTTP Working Group <ietf-http-wg@w3.org>
Message-ID: <7789133a0904022151yba83898xbe2e3c89f2a5f721@mail.gmail.com>

On Thu, Apr 2, 2009 at 7:42 PM, William A. Rowe, Jr.
<wrowe@rowe-clan.net> wrote:
> Adam Barth wrote:
>> You're ignoring the reality of existing Web content.  To interoperate
>> with existing Web content, a user agent must consider both the
>> Content-Type headers and the content when determining the media type
>> contained in a response.  To claim otherwise is fantasy.
>
> No, it's a statement of fact.

The fact is content sniffing is required to correctly process
approximately 1% of HTTP responses on the Web.  The HTTP spec forbids
this practice.  I cannot implement a user agent that both
interoperates with the Web and follows the HTTP specification.

> Roy has (correctly) pointed out to you, multiple times, that your perceived
> flaw of "broken web sites", are the direct result of inaccurate Content-Type
> headers being sniffed by user agents and represented inaccurately.
>
> The inaccurate headers are a direct -result- of content sniffing by the
> handful of user agents which tolerate inappropriate server behavior.
> You perceive this to a user agent issue.  It is a flaw in the servers.

Cause and effect are irrelevant.  My job as a user agent implementor
is to deal with the reality  with which I am presented not to
fantasize about other realities that might have been.

> There is no need to modify the spec (and I agree the reference to content
> sniffing can be dropped altogether) to describe what the UA's have foisted
> on the world.

There appear to be three roads forward:

1) Remove all references to Content-Type from the HTTP specification
and separately specify how to interpret the header.

2) Recommend against content sniffing but acknowledge that some user
agents might perform content sniffing to interoperate with legacy HTTP
servers.

3) Option (2) plus a recommended concrete sniffing algorithm that
works with existing Web content.

Leaving the spec unmodified renders the spec dead letter.

> As soon as all browsers conform to spec, authors/administrators will correct
> their errors, because those errors will be obvious to them.

Economics will prevent this from occurring.

> Ponder for a moment; 10 years ago, 1% - 3% of content could not be properly
> rendered with the Content-Type headers provided.  Today, has that number
> grown?  No, it's probably shrunk.

This is pure speculation.

> Look Adam, as soon as you describe how, by spec, I can offer an html file
> as an example in a text/plain repesentation (for illustration of tags)
> in spite of inappropriate behavior by IE and the host of others, I'll
> respect your position.

In fact, this works precisely as you desire according to
draft-abarth-mime-sniff!  Simply specify "text/plain" as the
Content-Type.

> But your rants are getting irritating.

I have attempted to provide technical arguments and not rants.

> If you
> demand that IE's autodeterministic features are of value, you've lost my
> attention already, because the world would not be polluted by utf7 xss
> vulnerabilities if not for such obnoxious, spec-incompliant behavior.

I'm not suggesting adopting IE sniffing algorithm.  That algorithm
makes a poor security / compatibility trade-off, as you point out.
draft-abarth-mime-sniff makes a security / compatibility trade-off
that has been informed by careful threat modeling and extensive
empirical evaluation.  I encourage you to read
http://www.adambarth.com/papers/2009/barth-caballero-song.pdf for
further details.

Adam

Received on Friday, 3 April 2009 04:52:04 UTC