Re: Content sniffing, feed readers, etc. (was HTML interpreter vs. HTML user agent) from Adam Barth on 2009-05-30 (public-html@w3.org from May 2009)

From: Adam Barth <w3c@adambarth.com>
Date: Sat, 30 May 2009 13:46:56 -0700
To: Larry Masinter <masinter@adobe.com>
Cc: Ian Hickson <ian@hixie.ch>, Sam Ruby <rubys@intertwingly.net>, Anne van Kesteren <annevk@opera.com>, Maciej Stachowiak <mjs@apple.com>, "Roy T. Fielding" <fielding@gbiv.com>, HTML WG <public-html@w3.org>
Message-ID: <7789133a0905301346kdc1cc71s84baa566b8f70497@mail.gmail.com>
On Sat, May 30, 2009 at 10:09 AM, Larry Masinter <masinter@adobe.com> wrote:
> Is this an "Informational" document ("Please be aware there are
> misconfigured HTTP servers and here's some good ideas how to
> deal with them") or does it have normative requirements?

My opinion is that the document should say the following:

1) User agents SHOULD NOT sniff.
2) If a user agent does sniff, then the user agent MUST use the
following sniffing algorithm.

One can argue that this is a contrary-to-duty imperative, but I don't
think that too big of an issue.  You're welcome to debate the
semantics of "user agent" etc, but I'm not sure how to contribute to
that discussion.

> If it has normative requirements, what categories of agents deal
> with them, and are the requirements mandatory (MUST) or
> RECOMMENDED (SHOULD) or allowed (MAY)?

User agents that wish to interact with existing web content.  Other
user agents are free to not sniff, as in (1).

> Do the normative requirements apply to *all* HTTP agents or
> are they only for clients? For proxies, gateways, security
> scanners? The current Internet Draft talks about "browsers",
> does it in fact apply to other HTTP clients, only to
> HTTP User Agents, only to HTTP User Agents that are also
> HTML User Agents?

I'll correct the draft to consistently refer to user agents.

> How does the HyperText Markup Language technical specification
> make normative reference to this specification?

That's a question for Ian, but I suspect he'll add a reference like he
references other RFC, etc.

> For example, if it applies to RSS feed readers, then those
> who implement RSS feed readers can judge whether the proposed
> specification is appropriate for them. As it stands, the
> document only calls out applicability to "browsers".

I welcome feedback from RSS feed reader implementors (and others).

> Also, whether the scope of the Internet Draft applies
> to web user agents that are not HTML5 conformant needs
> to be clarified.

I believe the draft will not make reference to HTML 5.

> it seems quite possible
> that HTTP server configurations could be updated as
> quickly as new HTML5 agents introduced.

I see this as unlikely, but I know of no empirical way to resolve this
question except to try.

> For example, perhaps the document should advised following
> the algorithms only when it is clearly necessary
> (SHOULD NOT perform content sniffing EXCEPT when
> necessary because of continued misconfiguration of
> HTTP servers),

How would one determine that sniffing is necessary?  Is that not just
sniffing for sniffing?

> In the interest of improved stability, reliability,
> security and extensibility of the Internet, I would
> like to see some back-pressure on this by a
> commitment from the browser vendors to restrict
> content-type sniffing to those HTTP requests
> which result from following links from HTML
> documents which do not contain any features
> not supported in currently widely deployed
> browsers -- which is the minimum scope necessary
> to accomplish the goals of this document. I.e.,
> if a web page contains any *new* HTML5 feature,
> then content-type sniffing would NOT work.

This proposal seems to be the opposite of stable, reliable, and
secure.  Consider a site that with a home page and an article.  Let's
say that to view the article correctly, we require content sniffing.
The site itself uses no HTML 5 features itself and works fine in all
browsers.  Now, the advertising network that supplies ads for the home
page decides to use the <video> element in some of its ads.  Now,
suddenly, the link from the home page to the article becomes
unreliable, working sometimes and failing others.  This kind of bug is
super mysterious and unpredictable.

> This would allow backward compatibility with
> currently deployed content, but reduce the
> amount of heuristics necessary to properly
> determine reliably the behavior of the
> receiving user agent.

On the contrary, it would be almost impossible to reliably determine
the behavior of the receiving user agent.  It would be anyone's guess
whether sniffing would be invoked for a particular HTTP response.

> (I think that's a general Design Principle I
> would advocate for other areas of "error handling"
> but that's a different topic.)

I suspect other error handling would have similar problems with this approach.

> ===================================
> Since you have 5 hours to devote to HTTP-related
> issues, could you please follow up on the
> "Origin" vs. "Referer" thread?

Origin is on my work queue.  I've committed to revising the sniffing
draft by a certain date, so that has higher priority at the moment.

> The last email
> I saw from you on the topic was that you would
> respond to the technical critiques of the document
> and consider how "Referer" could be adapted
> instead. However, indications from others,
> without reference to your email

I believe I've responded to all the technical critiques.  Is there
something in particular you think I've missed?

> On the other hand, the current HTML document and
> the discussions in the W3C seem to be at odds with
> the implied direction, and instead focus on
> "renaming" the "Origin" header. This seems confusing
> and inconsistent.

I don't see these directions as at odds.  I believe we've settled the
discussion and decided not to rename the Origin header.  The open
issue at the moment, I believe, is Mozilla's extended version of the
Origin header.

> From my personal view, the feedback I continue to get
> (most recently from a large ISP) is that the Origin
> header provides so little advantage that they don't
> think they would deploy it.

That's not surprising because ISP need not implement the Origin
header.  The Origin header is a feature to help web sites (not ISPs)
defend themselves against CSRF.

> The "browser security folks" may see the header
> as harmless, but a security mechanism which purports
> to solve a security problem, but which isn't deployed,
> is not only of limited utility, it's also harmful
> if it discourages continued efforts to actually solve
> the problem it was intended to address.

I don't buy this argument.  I'm happy to consider alternative designs
for mitigating CSRF.  In fact, as an academic, I review and discuss
such proposals on a routine basis.  Do you have another proposal for
mitigating CSRF that we ought to consider?

Adam
Received on Saturday, 30 May 2009 20:47:54 UTC