Re: NEW ISSUE: content sniffing

On Apr 1, 2009, at 11:57 PM, Adam Barth wrote:

> On Wed, Apr 1, 2009 at 5:43 PM, Roy T. Fielding <fielding@gbiv.com>  
> wrote:
>>> It's relevant to any HTTP user agent that wishes to interoperate  
>>> with
>>> existing Web content.  For example, Imageshop (described earlier in
>>> this thread) is not a browser but is interested in knowing when its
>>> users expect an HTTP response to be treated as an image.
>>
>> No, it isn't relevant at all.
>
> That's an extreme point of view.  The implementors of Imageshop
> certainly want to know how to determine the MIME type of HTTP
> responses they receive from the Web.

Maybe the implementors of Imageshop will read the thread and
understand that the media type of a message is not the same
thing as the data format of a message.  The media type (what was
called MIME type ages ago) is a processing instruction supplied
by the sender.  The media type cannot be discerned by looking at
the bits.  The data format can sometimes be discerned by looking
at the bits, which is a reasonable fallback behavior depending on
the context in which the request was made.

It is impossible to sniff for a media type because any given
data format matches at least two or more media types.  Making
a preference of one over another is something a particular kind
of user agent can do in a particular context in which it has
been given a particular configuration to do so.  None of those
variables have anything to do with HTTP.  HTTP is responsible
for communicating the sender's intentions.

>> What you are talking about is error recovery from the perspective
>> of a particular type of casual user, not the protocol.
>
> Without specifying how to determine the MIME type of HTTP responses,
> implementor of user agents interested interoperating with existing Web
> content will be forced to reverse engineer each other.  Undoubtedly,
> they will make mistakes, and we'll be in an even worse position than
> we are today.

Then fix the content metadata.  No other solution will work, period.

>> The protocol is stating the communication from the sender.  If the
>> communication is false, the protocol is being violated and that
>> violation is shown by failing to meet one of the protocol  
>> requirements.
>
> I agree that the spec should require that servers send the correct
> MIME type in the Content-Type header.  No one is disputing this.
>
>> The fact that different user agents deal with protocol violations in
>> different ways is a good thing.
>
> In this case, a diversity of sniffing algorithms is harmful both to
> interoperatiblity and security.  We'd be much better off if all the
> user agents that require sniffing used the same algorithm.

We would be better off if none of them sniffed.  That is the most
interoperable solution.

>> The whole philosophy that protocol specs, as opposed to browser
>> implementation specs, must describe the error-handling quirks
>> of browsers is bankrupt.
>
> I'm not proposing the spec describe the error handling quirks of
> browsers.  I'm propose that the spec contain enough detail that
> implementors of future user agents (if they are so inclined) can
> determine the MIME type of HTTP responses from the Web.

It already does contain everything that can be truly said about
determining the media type.  The only thing it doesn't define is
what the recipient should do when it detects an error or when
no type is supplied, and the reason for that is because the behavior
is different for every single type of recipient.  The only reason
that the HTML5 folks can pretend to answer that question is because
they currently ignore the needs of all recipients other than the
big general-purpose browsers.  IETF concerns >> WHATWG concerns.

>>  The correct way to interoperate with broken Web
>> content is to display a very large error message that explains why
>> it is broken.
>
> Sadly, such user agents will not be as popular as those that just  
> work.

That is a matter of opinion.  I have seen no evidence to suggest
that MSIE bugs actually helped it in competing with other browsers.
I have seen plenty of evidence that MSIE is upgraded or uninstalled
on an institutional basis when its bugs create a liability.  The
same will hold true for other browsers.

>> The fact that *some* HTTP user agents, notably the
>> big browser vendors, choose not to display those errors in a usable
>> way is a design choice for those user agents.
>
> Be that as it may, as an implementor of a new user agent that would
> like to interoperate with the Web, I would like to know how to
> determine the MIME type of existing Web content.

Read the Content-Type header field and behave accordingly. If it is
obviously in error, then work around that error while informing
the user.

>> The common browser behavior does not need to be standardized in HTTP
>> because it is not common to the vast majority of user agent
>> implementations, which far outnumber the general purpose browsers.
>
> I am not interested in imposing browser behavior on other user agents.
>  In fact, I think the spec should recommend against sniffing.
>
>> It cannot be standardized in a way that would be safe for safety- 
>> critical
>> environments such as health care, where failure to display the errors
>> could very well result in serious injury or death.
>
> These user agents are free to follow the recommended course of action
> and not perform sniffing.  If they do require sniffing, then we're
> much better off if they follow a standardized algorithm than if they
> incorrectly reverse engineer other user agents.  Being explicit has
> fewer failure modes that burying our heads in the sand.
>
>>  And I have no
>> doubt that MSIE will change its sniffing behavior shortly after a
>> few lawsuits demonstrate their stupidity.
>
> You're entitled to that opinion, but I don't see content sniffing
> going away anytime soon.

Time will tell.  I only document the technical solutions that
actually work.

....Roy

Received on Thursday, 2 April 2009 21:33:17 UTC