Re: Lack of sniffing of text/plain for non-binary content

On 9 Sep 2007, at 12:51, Julian Reschke wrote:

> Geoffrey Sneddon wrote:
>> Hi,
>> Currently we only sniff text/plain (in certain conditions, being  
>> there is no content-encoding headers and content-type is equal to  
>> one of "text/plain", "text/plain; charset=ISO-8859-1", or "text/ 
>> plain; charset=iso-8859-1") to see whether it is binary content or  
>> not. However, this poses issues for a large number of feeds that  
>> are served as text/plain: a notable example of this is <http:// 
>> youtube.com/rss/global/top_favorites.rss>.
>
> How do you distinguish between a feed that is served as text/plain  
> because the authors wants to have it handled as plain text, as  
> opposed to a mislabeled feed?

How do you distinguish between an HTML page that has an invalid first  
tag, and a feed? There's no way around it. You simply have to assume  
the author has made a mistake, which is almost always the case.

> Please let's not make the content sniffing situation even worse  
> than what the spec says right now.

I'm getting bug reports from SVN code that follows the spec, mainly  
due to the above issue. I may well go back to what I did previously.  
End-users don't care where the problem is. From their POV, a feed can  
be viewed in x but not y, therefore x is a better feed reader.

> Anyway, as our spec author works for Google, and YouTube is owned  
> by Google, maybe this issue can be fixed easily.

YouTube is by no means the only site that does this, just the biggest  
site I could find looking through a list of 20 feeds (of which almost  
a quarter were served under an incorrect MIME type).


- Geoffrey Sneddon

Received on Tuesday, 18 September 2007 19:50:53 UTC