Re: change proposal for issue-86, was: ISSUE-86 - atom-id-stability - Chairs Solicit Proposals from Tab Atkins Jr. on 2010-04-15 (public-html@w3.org from April 2010)

From: Tab Atkins Jr. <jackalmage@gmail.com>
Date: Thu, 15 Apr 2010 15:25:59 -0700
To: "Edward O'Connor" <hober0@gmail.com>
Cc: Sam Ruby <rubys@intertwingly.net>, Ian Hickson <ian@hixie.ch>, Julian Reschke <julian.reschke@gmx.de>, "public-html@w3.org WG" <public-html@w3.org>
Message-ID: <t2ldd0fbad1004151525v7390f792j889a0efdcbfea7f7@mail.gmail.com>

So, summary of the discussion so far:

1. Getting good IDs on the entry level (and maybe feed level) is
important for the good operation of many/most/all feed consumers.

2. The HTML5-defined algo works great when the <article> has an #id or
rel=bookmark to draw the ID from.

3. When neither exist, we have a slightly more complex problem.

The problem for #3 boils down to:

1. As long as the content of the <article> doesn't change, everything
works fine.  (Assuming we change the SHOULD in the spec to a MUST.)  A
given consumer will always generate the same ID from the same content,
which is exactly the behavior we want.

2. If the content changes, the behavior is no longer good.  A given
consumer may generate a different ID, though the ideal behavior in
this case is that it have the same ID.

3. There is no way to resolve point #2.  *Any* content that doesn't
have an embedded guid of some kind, whether it be HTML, plain text, or
what have you, will be impossible to generate the same ID for when the
content changes, because the only "unique" thing about it is the
content itself, which has changed.

4. Something that will *not* happen with point #2 is a given HTML page
generating different IDs for each article between different runs of
the same tool.  The requirement (currently SHOULD, should be MUST)
ensures that, given the same feed consumer, the generated IDs will be
stable for identical content.  They can be different between different
tools, but I think that's okay.


So, possible resolutions?  There are two reasonable ones:

1. The guarantee of same-ID-for-same-content is good enough.  Authors
won't change their pages often enough to make it too annoying, and
users will generally receive a given feed with a single tool (or at
least, the uniqueness of the IDs only matters within a given tool, and
there is no significant communication between different tools).  Feed
validators can always flag entries that they have to generate an ID
for, and warn authors that this may create a suboptimal experience for
their readers.

2. The guarantee of same-ID-for-same-content isn't good enough.  Any
<article> without an #id or a rel=bookmark descendant is incapable of
generating an Atom entry.  Feed validators can always flag entries
that they can't generate an ID for, and warn authors that these
articles won't show up for their readers.


I believe Sam and Julian are suggesting #2.  Hober seems to be
suggesting #2 as well.  Ian is suggesting #1, along with the
suggestion that Atom should be fixed to make this less of a problem.
I prefer #1, unless there is evidence that content *does* change often
enough in sufficient numbers of feed-producing things to cause a
problem.

Either one means that, in some situations, readers will have a
suboptimal experience.  It's simply a question of whether you prefer
the risk of spamming readers with near-duplicate entries (with the
benefit that more pages will be capable of generating an Atom feed),
or the risk of readers missing out on content (less pages capable of
generating an Atom feed).

Hixie also fears that if we opt for #2, we'll end up with feed
consumers inventing proprietary ways to do #1 instead.

~TJ

Received on Thursday, 15 April 2010 22:26:51 UTC