- From: Sam Ruby <rubys@intertwingly.net>
- Date: Wed, 14 Apr 2010 21:53:14 -0400
- To: Ian Hickson <ian@hixie.ch>
- CC: "public-html@w3.org WG" <public-html@w3.org>
On 04/14/2010 07:08 PM, Ian Hickson wrote: > On Wed, 14 Apr 2010, Sam Ruby wrote: >>> >>> I don't think that saying that if an implementation doesn't know if it >>> created a feed before, it should not be allowed to create a feed, is a >>> good trade-off. I think it would be ignored. >> >> I suggest that you actually test out how common feed aggregators react >> when they are presented with the same feed differing only in the entry >> ids. >> >> Here's a few quick links for context: >> >> http://tinyurl.com/y2pobd3 >> http://www.hutteman.com/weblog/2003/03/14-51.html >> https://issues.apache.org/jira/browse/ROL-580 > > I don't think anyone is suggesting that user agents that are able to keep > the IDs constant should do anything but keep the IDs constant. > >>> Basically making this a MUST would lead to implementations having to >>> violate the spec to do anything useful. When we require that >>> implementations violate the spec, we lead to them ignoring the spec >>> even when it's not necessary. >> >> Based on my experience with feeds (predating Atom), this part of the >> spec will not be ignored. Users will write bug reports against the >> software that implements the algorithm. > > If a feed producer has to invent an ID from nothing, and doesn't know what > ID it used in the past, yet the spec uses "MUST" here, how exactly can it > do anything _but_ ignore the spec? I would like to repeat my suggestion: I suggest that you actually test out how common feed aggregators react when they are presented with the same feed differing only in the entry ids. I am the author/maintainer of one such aggregator, namely Planet Venus. It is used in number of places like Planet HTML5 and Planet Mozilla. A typical setup fetches feeds every hour. I'll describe how it will process feeds with changing IDs. Once an hour it will attempt to fetch the feed. If the ETag and/or Last Modified HTTP headers are set up and have not changed, it will do nothing at all for that feed. If either have changed for any reason, it will process every entry in that feed. If the ID for any entry is one that hasn't been seen before, that entry will be added to the page. Where that entry will be added will depend on the updated date as specified in the feed entry itself. If that updated date is the same, what you will generally see is that at the second hour, each entry gets duplicated. On the next hour, you will see another copy of each entry. And this will repeat each subsequent entry until the end result is a single page with the first entry repeated enough times that it pushes all further entries off of the page. This cycle will repeat if any truly new entry is processed. That's if the updated date is the same each time. If the updated date varies too (typically representing "now"), what you will see instead is that each hour all of the entries from this one feed will advance to the top, pushing all other feeds that the planet is subscribed too down, and possibly off of the page. If you like, I can go further and describe how feeds which are ill-formed and/or non-conforming are handled by this software, and describe scenarios where the feed has no recognizable ids or even dates. Alternately, the software is fully open source and you can inspect it yourself or even try it out. I can also describe options that I have provided to people who administer such software for dealing with pathological feeds. I also suspect that others here have access to tools like Google Reader and the like, and can run their own experiments. Or have access to authors to such software and can ask their opinions. I encourage everybody here to do so. - Sam Ruby
Received on Thursday, 15 April 2010 01:53:46 UTC