Re: change proposal for issue-86, was: ISSUE-86 - atom-id-stability - Chairs Solicit Proposals from Ian Hickson on 2010-04-15 (public-html@w3.org from April 2010)

From: Ian Hickson <ian@hixie.ch>
Date: Thu, 15 Apr 2010 23:29:09 +0000 (UTC)
To: "Tab Atkins Jr." <jackalmage@gmail.com>
Cc: Edward O'Connor <hober0@gmail.com>, Sam Ruby <rubys@intertwingly.net>, Julian Reschke <julian.reschke@gmx.de>, "public-html@w3.org WG" <public-html@w3.org>
Message-ID: <Pine.LNX.4.64.1004152233260.23507@ps20323.dreamhostps.com>

On Thu, 15 Apr 2010, Tab Atkins Jr. wrote:
> 
> So, possible resolutions?  There are two reasonable ones:
> 
> 1. The guarantee of same-ID-for-same-content is good enough.  Authors 
> won't change their pages often enough to make it too annoying, and users 
> will generally receive a given feed with a single tool (or at least, the 
> uniqueness of the IDs only matters within a given tool, and there is no 
> significant communication between different tools).  Feed validators can 
> always flag entries that they have to generate an ID for, and warn 
> authors that this may create a suboptimal experience for their readers.
> 
> 2. The guarantee of same-ID-for-same-content isn't good enough.  Any 
> <article> without an #id or a rel=bookmark descendant is incapable of 
> generating an Atom entry.  Feed validators can always flag entries that 
> they can't generate an ID for, and warn authors that these articles 
> won't show up for their readers.
> 
> I believe Sam and Julian are suggesting #2.  Hober seems to be 
> suggesting #2 as well.  Ian is suggesting #1, along with the suggestion 
> that Atom should be fixed to make this less of a problem.

I'm not suggesting any changes to Atom.

I'm also not suggesting #1 above. It's quite possible to have multiple 
articles with the same content that aren't the same -- for example, if I 
just post "good night" each night to my blog, and my blog doesn't have 
permalinks or dates or whatnot, then there will be lots of "duplicate" 
articles. Similarly, a blog post with comments might have several "Get 
viagra" comments from "Anonymous Casino", yet they aren't the same.

I'm suggesting that when it's impossible to come up with a good consistent 
ID, the UA should do its best. It might not be able to, in which case .

> Either one means that, in some situations, readers will have a
> suboptimal experience.

Only if the HTML page is repeatedly converted, but yes, if that happens, 
e.g. if you subscribe to a dynamically updating conversion of an HTML page 
that for some reason (what reason?) can't keep track of previous IDs it 
has used, then you'll end up with a horrible experience.

> It's simply a question of whether you prefer the risk of spamming 
> readers with near-duplicate entries (with the benefit that more pages 
> will be capable of generating an Atom feed), or the risk of readers 
> missing out on content (less pages capable of generating an Atom feed).

Indeed. At least if we do a best-effort attempt at outputting content in 
this rare case, users will be able to fix up the feed before using it, if 
necessary. If the data is just dropped on the floor then they'll have far 
more work.

> Hixie also fears that if we opt for #2, we'll end up with feed consumers 
> inventing proprietary ways to do #1 instead.

That seems inevitable. It's like people saying that HTML browsers should 
just refuse to render pages with invalid HTML.

It seems to me that this is a lot of ado about nothing. The cases where a 
tool can't generate consistent IDs are petty obscure, e.g. having a 
read-only filesystem and no cloud storage, or the user switching to 
another tool or device. Sure, we have to acknowledge those cases, but 
"SHOULD" does not mean "MAY". It means "MUST except if there's a good 
reason".

-- 
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'

Received on Thursday, 15 April 2010 23:29:40 UTC