W3C home > Mailing lists > Public > public-html@w3.org > May 2009

RE: Content sniffing, feed readers, etc. (was HTML interpreter vs. HTML user agent)

From: Larry Masinter <masinter@adobe.com>
Date: Sat, 30 May 2009 10:09:05 -0700
To: Adam Barth <w3c@adambarth.com>
CC: Ian Hickson <ian@hixie.ch>, Sam Ruby <rubys@intertwingly.net>, Anne van Kesteren <annevk@opera.com>, Maciej Stachowiak <mjs@apple.com>, "Roy T. Fielding" <fielding@gbiv.com>, HTML WG <public-html@w3.org>
Message-ID: <8B62A039C620904E92F1233570534C9B0118CD95EC3C@nambx04.corp.adobe.com>
> Updating the I-D is high on my priority list ATM.  In fact, I have
> five hours scheduled tomorrow to do just that.  Do you have any
> specific technical points that you want to make sure I address?

The discussion here seems to focus on the scope of applicability
and the nature of the conformance requirements for different
classes of HTTP and HTML agents.

Is this an "Informational" document ("Please be aware there are
misconfigured HTTP servers and here's some good ideas how to
deal with them") or does it have normative requirements?

If it has normative requirements, what categories of agents deal
with them, and are the requirements mandatory (MUST) or 

Do the normative requirements apply to *all* HTTP agents or
are they only for clients? For proxies, gateways, security
scanners? The current Internet Draft talks about "browsers",
does it in fact apply to other HTTP clients, only to
HTTP User Agents, only to HTTP User Agents that are also
HTML User Agents? 

How does the HyperText Markup Language technical specification
make normative reference to this specification? 

The scope and normative status of the specification affects
the nature of the review -- once the proposed scope is
clear, then the applicability of the specification and its
suitability for the agents which are in-scope can be judged.

For example, if it applies to RSS feed readers, then those
who implement RSS feed readers can judge whether the proposed
specification is appropriate for them. As it stands, the
document only calls out applicability to "browsers".

However, since the desired scope of the HTML document
are all HTML agents, not just browsers, and the proposal
is to make normative reference to the content-sniffing
document, then the manner in which the normative 
reference will be made needs to be clarified.

Also, whether the scope of the Internet Draft applies
to web user agents that are not HTML5 conformant needs
to be clarified.

I'm not certain where I stand, personally, on what the
normative requirements should be, between Informational
or Proposed Standard, but I'm in favor of making the
scope as narrow as possible, as it seems quite possible
that HTTP server configurations could be updated as
quickly as new HTML5 agents introduced.

For example, perhaps the document should advised following
the algorithms only when it is clearly necessary
(SHOULD NOT perform content sniffing EXCEPT when
necessary because of continued misconfiguration of
HTTP servers), and MUST NOT perform any content
sniffing except as indicated. I think it may be
reasonable to limit the scope to "browsers"
and those HTML agents that require compatibility
with existing deployed browsers.

In the interest of improved stability, reliability, 
security and extensibility of the Internet, I would
like to see some back-pressure on this by a
commitment from the browser vendors to restrict
content-type sniffing to those HTTP requests 
which result from following links from HTML
documents which do not contain any features
not supported in currently widely deployed 
browsers -- which is the minimum scope necessary
to accomplish the goals of this document. I.e.,
if a web page contains any *new* HTML5 feature,
then content-type sniffing would NOT work.

This would allow backward compatibility with 
currently deployed content, but reduce the
amount of heuristics necessary to properly
determine reliably the behavior of the
receiving user agent.

(I think that's a general Design Principle I
would advocate for other areas of "error handling"
but that's a different topic.)
Since you have 5 hours to devote to HTTP-related
issues, could you please follow up on the
"Origin" vs. "Referer" thread? The last email
I saw from you on the topic was that you would
respond to the technical critiques of the document
and consider how "Referer" could be adapted
instead. However, indications from others,
without reference to your email


> I'm quite interested in the idea of recommending or requiring that
> user agents always send a Referer header (and letting them send the
> value "null" if they have nothing better to send).  This design has
> the distinct advantage of protecting Web sites that currently
> implement lenient Referer validation.  My plan is to float this idea
> with some browser security folks and see if they'd be willing to
> implement it.

On the other hand, the current HTML document and
the discussions in the W3C seem to be at odds with
the implied direction, and instead focus on
"renaming" the "Origin" header. This seems confusing
and inconsistent.

>From my personal view, the feedback I continue to get
(most recently from a large ISP) is that the Origin 
header provides so little advantage that they don't 
think they would deploy it.

The "browser security folks" may see the header
as harmless, but a security mechanism which purports
to solve a security problem, but which isn't deployed,
is not only of limited utility, it's also harmful
if it discourages continued efforts to actually solve
the problem it was intended to address. As far as
I can tell, the only service provider or server
implementer that I have heard intends to deploy the
Origin header mechanism is Google. There were no
comments from implementors of proxies and gateways
as to whether they would commit to not stripping
or modifying Origin in the same way as they did
Origin, and no particular analysis of the resulting
security infrastructure if they did.

Received on Saturday, 30 May 2009 17:10:18 UTC

This archive was generated by hypermail 2.3.1 : Thursday, 29 October 2015 10:15:45 UTC