Re: Lack of sniffing of text/plain for non-binary content from ryan on 2007-09-09 (public-html@w3.org from September 2007)

From: ryan <ryan@theryanking.com>
Date: Sun, 9 Sep 2007 14:50:42 -0700
To: Julian Reschke <julian.reschke@gmx.de>
Cc: Geoffrey Sneddon <foolistbar@googlemail.com>, public-html@w3.org
Message-Id: <94CE61BD-DEB2-4CF5-B10F-F103D608B640@theryanking.com>

On Sep 9, 2007, at 4:51 AM, Julian Reschke wrote:
> Geoffrey Sneddon wrote:
>> Hi,
>> Currently we only sniff text/plain (in certain conditions, being  
>> there is no content-encoding headers and content-type is equal to  
>> one of "text/plain", "text/plain; charset=ISO-8859-1", or "text/ 
>> plain; charset=iso-8859-1") to see whether it is binary content or  
>> not. However, this poses issues for a large number of feeds that  
>> are served as text/plain: a notable example of this is <http:// 
>> youtube.com/rss/global/top_favorites.rss>.
>
> How do you distinguish between a feed that is served as text/plain  
> because the authors wants to have it handled as plain text, as  
> opposed to a mislabeled feed?

As someone who works on a spider that uses feeds heavily, the only  
way we've found to make it work is to always assume that if it looks  
like a feed, it should be treated as such. Interactive user agents  
may have different constraints that lead to different solutions.

-ryan

Received on Sunday, 9 September 2007 21:50:53 UTC