Re: change proposal for issue-86, was: ISSUE-86 - atom-id-stability - Chairs Solicit Proposals from Sam Ruby on 2010-04-15 (public-html@w3.org from April 2010)

From: Sam Ruby <rubys@intertwingly.net>
Date: Thu, 15 Apr 2010 19:12:16 -0400
To: "Tab Atkins Jr." <jackalmage@gmail.com>
CC: "Edward O'Connor" <hober0@gmail.com>, Ian Hickson <ian@hixie.ch>, Julian Reschke <julian.reschke@gmx.de>, "public-html@w3.org WG" <public-html@w3.org>
Message-ID: <4BC79D50.9060007@intertwingly.net>
On 04/15/2010 06:25 PM, Tab Atkins Jr. wrote:
> So, summary of the discussion so far:
>
> 1. Getting good IDs on the entry level (and maybe feed level) is
> important for the good operation of many/most/all feed consumers.
>
> 2. The HTML5-defined algo works great when the<article>  has an #id or
> rel=bookmark to draw the ID from.
>
> 3. When neither exist, we have a slightly more complex problem.
>
> The problem for #3 boils down to:
>
> 1. As long as the content of the<article>  doesn't change, everything
> works fine.  (Assuming we change the SHOULD in the spec to a MUST.)  A
> given consumer will always generate the same ID from the same content,
> which is exactly the behavior we want.
>
> 2. If the content changes, the behavior is no longer good.  A given
> consumer may generate a different ID, though the ideal behavior in
> this case is that it have the same ID.
>
> 3. There is no way to resolve point #2.  *Any* content that doesn't
> have an embedded guid of some kind, whether it be HTML, plain text, or
> what have you, will be impossible to generate the same ID for when the
> content changes, because the only "unique" thing about it is the
> content itself, which has changed.
>
> 4. Something that will *not* happen with point #2 is a given HTML page
> generating different IDs for each article between different runs of
> the same tool.  The requirement (currently SHOULD, should be MUST)
> ensures that, given the same feed consumer, the generated IDs will be
> stable for identical content.  They can be different between different
> tools, but I think that's okay.
>
> So, possible resolutions?  There are two reasonable ones:
>
> 1. The guarantee of same-ID-for-same-content is good enough.  Authors
> won't change their pages often enough to make it too annoying, and
> users will generally receive a given feed with a single tool (or at
> least, the uniqueness of the IDs only matters within a given tool, and
> there is no significant communication between different tools).  Feed
> validators can always flag entries that they have to generate an ID
> for, and warn authors that this may create a suboptimal experience for
> their readers.
>
> 2. The guarantee of same-ID-for-same-content isn't good enough.  Any
> <article>  without an #id or a rel=bookmark descendant is incapable of
> generating an Atom entry.  Feed validators can always flag entries
> that they can't generate an ID for, and warn authors that these
> articles won't show up for their readers.
>
> I believe Sam and Julian are suggesting #2.  Hober seems to be
> suggesting #2 as well.  Ian is suggesting #1, along with the
> suggestion that Atom should be fixed to make this less of a problem.
> I prefer #1, unless there is evidence that content *does* change often
> enough in sufficient numbers of feed-producing things to cause a
> problem.

I was with you until this paragraph.  I don't believe that we have been 
discussing "guarantee of same-ID-for-same-content".  The closest I 
believe that we came was what you described as "The requirement 
(currently SHOULD, should be MUST) ensures that, given the same feed 
consumer, the generated IDs will be stable for identical content."

Some of us thought that that SHOULD be a MUST.  Others disagreed.

> Either one means that, in some situations, readers will have a
> suboptimal experience.  It's simply a question of whether you prefer
> the risk of spamming readers with near-duplicate entries (with the
> benefit that more pages will be capable of generating an Atom feed),
> or the risk of readers missing out on content (less pages capable of
> generating an Atom feed).

I agree that this is a tradeoff.  I don't personally believe that we 
need to make that choice or that a standard is required, but I can see 
how each approach has its advantages in certain cases -- again on the 
presumption that there is a "guarantee of same-ID-for-same-content".

I'll even argue weakly for #1.  A guarantee of same-ID-for-same-content 
may in fact be good enough.  And for the subset of documents for which 
the feeds generated turn out to be, as you put it, suboptimal, well 
those documents are exactly the ones that would benefit by the addition 
of a #id or a rel=bookmark.  The only caution I will add is that the 
addition of this information will incur a one-time usability hit as 
every consumer of the feed rediscovers the same entries.

I'll also add that in addition to not believing that we need to make 
that tradeoff on behalf of everyone, I believe that standardizing this 
is premature as the tradeoffs we've identified are only conjecture. 
Real world usage inevitably will identify others that we haven't considered.

> Hixie also fears that if we opt for #2, we'll end up with feed
> consumers inventing proprietary ways to do #1 instead.

I believe that's a red-herring.  Atom is openly extensible[1].  The 
ethos of the Atom WG can be described as "adopt a common core and 
innovate around it at will".  Innovations include Open Data Protocol, 
SDShare, OASIS CMIS, PubSubHubbub Core, Google GData, .... but I digress.

The topic being discussed here isn't the extensions, but producing 
something that is usable by virtue of adopting the common core defined 
by Atom.

> ~TJ

[1] 
http://www.atomenabled.org/developers/syndication/atom-format-spec.php#extending_atom
Received on Thursday, 15 April 2010 23:12:51 UTC