Re: HTML interpreter vs. HTML user agent from Sam Ruby on 2009-05-28 (public-html@w3.org from May 2009)

From: Sam Ruby <rubys@intertwingly.net>
Date: Thu, 28 May 2009 10:20:52 -0400
To: Anne van Kesteren <annevk@opera.com>
CC: Maciej Stachowiak <mjs@apple.com>, "Roy T. Fielding" <fielding@gbiv.com>, Larry Masinter <masinter@adobe.com>, HTML WG <public-html@w3.org>
Message-ID: <4A1E9DC4.6090501@intertwingly.net>

Anne van Kesteren wrote:
> On Thu, 28 May 2009 15:41:36 +0200, Sam Ruby <rubys@intertwingly.net>
> wrote:
>> I don't understand the "conformance with HTTP" part of the
>> question.  I believe that the current spec'ed behavior constitutes
>> "a willful violation of the HTTP specification, which requires that
>> the Content-Type headers be honored, despite implementation
>> experience showing that this is not pratical in many cases."
> 
> Currently they completely violate HTTP. By following the rules layed
> out in HTML5 they could get much closer. (I agree that it is probably
> better for this part of HTML5 to end up with the IETF, but I still
> think it would make sense for feed readers to adhere to the rules as
> well.)

I believe that if you were to expand the scope to include feed readers 
and media players and all other user agents, and get representatives of 
such to actually participate in the discussion, the set of rules you 
would end up will look markedly different than the ones captured in the 
current specification.

> When sniffing was discussed a while ago I remember that
> technorati.com and a feed library gsnedders was working on made their
> code much stricter. They're not browsers.

And I can identify a few products and libraries that have become more 
liberal over time.

>> The actual observed behavior of user agents designed to (primarily)
>>  process content of a certain media type (either in general, or in
>> the specific context) is to make every effort to parse the content
>> according to those rules, and only if such rules fail to produce
>> meaningful results will they investigate alternatives.
>> 
>> Browsers will first attempt to process content as HTML. FeedReaders
>> will first attempt to process content as a feed. Media plays will
>> first attempt to process content as media.
>> 
>> Browsers, when chasing an image tag, will make different
>> assumptions than when presented with a raw uri from the chrome.
>> 
>> All are equally "right" or "wrong".
> 
> While it is certainly true that different contexts have different
> sniffing rules, reducing that to a minimum would be good I think. Or
> are you saying the attempt is futile?

Permit me to turn that around... can we precisely identify in which 
contexts the "#content-type-sniffing:-feed-or-html" section is meant to 
apply?

If those set of rules are meant to only apply to browsers, and appear in 
a document labeled as a browser behavior specification, then all 
concerns go away.  If those set of rules are meant to apply to 
everybody, then the discussion needs to move to the IETF, and the 
content in that section will likely look markedly different once that 
process is complete.

 From my perspective, proper labeling or splitting out are both 
acceptable outcomes.

>> None of this is meant to imply that the behavior that is being
>> settled upon by browser manufacturers isn't worth specifying or
>> standardizing.
> 
> Right.

- Sam Ruby

Received on Thursday, 28 May 2009 14:21:25 UTC