Re: HTML interpreter vs. HTML user agent

On 28 May 2009, at 14:57, Anne van Kesteren wrote:

> On Thu, 28 May 2009 15:41:36 +0200, Sam Ruby  
> <rubys@intertwingly.net> wrote:
>> I don't understand the "conformance with HTTP" part of the  
>> question.  I
>> believe that the current spec'ed behavior constitutes "a willful
>> violation of the HTTP specification, which requires that the
>> Content-Type headers be honored, despite implementation experience
>> showing that this is not pratical in many cases."
>
> Currently they completely violate HTTP. By following the rules layed  
> out in HTML5 they could get much closer. (I agree that it is  
> probably better for this part of HTML5 to end up with the IETF, but  
> I still think it would make sense for feed readers to adhere to the  
> rules as well.)
>
> When sniffing was discussed a while ago I remember that  
> technorati.com and a feed library gsnedders was working on made  
> their code much stricter. They're not browsers.

I can't find any reference to Technorati looking in the archives, but  
what SimplePie does (which is what I (sorta) work on) matches an old  
draft of what is in HTML 5 (the only differences don't effect our  
detection of text/html v. feed types from memory). What we did before  
that was do what Sam is describing, completely ignore the Content-Type  
header (that is actually untrue, as we used some regex like "; 
\s*charset=([^;]+)" to get a charset from it). Only a small number of  
feeds broke (all served as text/plain to my knowledge).


--
Geoffrey Sneddon
<http://gsnedders.com/>
<http://simplepie.org/>

Received on Thursday, 28 May 2009 14:40:45 UTC