Re: change proposal for issue-86, was: ISSUE-86 - atom-id-stability - Chairs Solicit Proposals from Sam Ruby on 2010-04-15 (public-html@w3.org from April 2010)

From: Sam Ruby <rubys@intertwingly.net>
Date: Wed, 14 Apr 2010 21:53:14 -0400
To: Ian Hickson <ian@hixie.ch>
CC: "public-html@w3.org WG" <public-html@w3.org>
Message-ID: <4BC6718A.3040909@intertwingly.net>
On 04/14/2010 07:08 PM, Ian Hickson wrote:
> On Wed, 14 Apr 2010, Sam Ruby wrote:
>>>
>>> I don't think that saying that if an implementation doesn't know if it
>>> created a feed before, it should not be allowed to create a feed, is a
>>> good trade-off. I think it would be ignored.
>>
>> I suggest that you actually test out how common feed aggregators react
>> when they are presented with the same feed differing only in the entry
>> ids.
>>
>> Here's a few quick links for context:
>>
>> http://tinyurl.com/y2pobd3
>> http://www.hutteman.com/weblog/2003/03/14-51.html
>> https://issues.apache.org/jira/browse/ROL-580
>
> I don't think anyone is suggesting that user agents that are able to keep
> the IDs constant should do anything but keep the IDs constant.
>
>>> Basically making this a MUST would lead to implementations having to
>>> violate the spec to do anything useful. When we require that
>>> implementations violate the spec, we lead to them ignoring the spec
>>> even when it's not necessary.
>>
>> Based on my experience with feeds (predating Atom), this part of the
>> spec will not be ignored.  Users will write bug reports against the
>> software that implements the algorithm.
>
> If a feed producer has to invent an ID from nothing, and doesn't know what
> ID it used in the past, yet the spec uses "MUST" here, how exactly can it
> do anything _but_ ignore the spec?

I would like to repeat my suggestion:

   I suggest that you actually test out how common feed aggregators
   react when they are presented with the same feed differing only in
   the entry ids.

I am the author/maintainer of one such aggregator, namely Planet Venus. 
  It is used in number of places like Planet HTML5 and Planet Mozilla. 
A typical setup fetches feeds every hour.  I'll describe how it will 
process feeds with changing IDs.

Once an hour it will attempt to fetch the feed.  If the ETag and/or Last 
Modified HTTP headers are set up and have not changed, it will do 
nothing at all for that feed.  If either have changed for any reason, it 
will process every entry in that feed.

If the ID for any entry is one that hasn't been seen before, that entry 
will be added to the page.  Where that entry will be added will depend 
on the updated date as specified in the feed entry itself.  If that 
updated date is the same, what you will generally see is that at the 
second hour, each entry gets duplicated.  On the next hour, you will see 
another copy of each entry.  And this will repeat each subsequent entry 
until the end result is a single page with the first entry repeated 
enough times that it pushes all further entries off of the page.

This cycle will repeat if any truly new entry is processed.

That's if the updated date is the same each time.  If the updated date 
varies too (typically representing "now"), what you will see instead is 
that each hour all of the entries from this one feed will advance to the 
top, pushing all other feeds that the planet is subscribed too down, and 
possibly off of the page.

If you like, I can go further and describe how feeds which are 
ill-formed and/or non-conforming are handled by this software, and 
describe scenarios where the feed has no recognizable ids or even dates. 
  Alternately, the software is fully open source and you can inspect it 
yourself or even try it out.

I can also describe options that I have provided to people who 
administer such software for dealing with pathological feeds.

I also suspect that others here have access to tools like Google Reader 
and the like, and can run their own experiments.  Or have access to 
authors to such software and can ask their opinions.  I encourage 
everybody here to do so.

- Sam Ruby
Received on Thursday, 15 April 2010 01:53:46 UTC