W3C home > Mailing lists > Public > public-html@w3.org > September 2007

Re: Lack of sniffing of text/plain for non-binary content

From: Geoffrey Sneddon <foolistbar@googlemail.com>
Date: Tue, 18 Sep 2007 20:50:41 +0100
Message-Id: <E70F4FC0-B8D8-4FE7-814E-8CDB09A57AE6@googlemail.com>
Cc: public-html@w3.org
To: Julian Reschke <julian.reschke@gmx.de>


On 9 Sep 2007, at 12:51, Julian Reschke wrote:

> Geoffrey Sneddon wrote:
>> Hi,
>> Currently we only sniff text/plain (in certain conditions, being  
>> there is no content-encoding headers and content-type is equal to  
>> one of "text/plain", "text/plain; charset=ISO-8859-1", or "text/ 
>> plain; charset=iso-8859-1") to see whether it is binary content or  
>> not. However, this poses issues for a large number of feeds that  
>> are served as text/plain: a notable example of this is <http:// 
>> youtube.com/rss/global/top_favorites.rss>.
>
> How do you distinguish between a feed that is served as text/plain  
> because the authors wants to have it handled as plain text, as  
> opposed to a mislabeled feed?

How do you distinguish between an HTML page that has an invalid first  
tag, and a feed? There's no way around it. You simply have to assume  
the author has made a mistake, which is almost always the case.

> Please let's not make the content sniffing situation even worse  
> than what the spec says right now.

I'm getting bug reports from SVN code that follows the spec, mainly  
due to the above issue. I may well go back to what I did previously.  
End-users don't care where the problem is. From their POV, a feed can  
be viewed in x but not y, therefore x is a better feed reader.

> Anyway, as our spec author works for Google, and YouTube is owned  
> by Google, maybe this issue can be fixed easily.

YouTube is by no means the only site that does this, just the biggest  
site I could find looking through a list of 20 feeds (of which almost  
a quarter were served under an incorrect MIME type).


- Geoffrey Sneddon
Received on Tuesday, 18 September 2007 19:50:53 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Wednesday, 9 May 2012 00:16:07 GMT