- From: Tab Atkins Jr. <jackalmage@gmail.com>
- Date: Thu, 15 Apr 2010 15:25:59 -0700
- To: "Edward O'Connor" <hober0@gmail.com>
- Cc: Sam Ruby <rubys@intertwingly.net>, Ian Hickson <ian@hixie.ch>, Julian Reschke <julian.reschke@gmx.de>, "public-html@w3.org WG" <public-html@w3.org>
So, summary of the discussion so far: 1. Getting good IDs on the entry level (and maybe feed level) is important for the good operation of many/most/all feed consumers. 2. The HTML5-defined algo works great when the <article> has an #id or rel=bookmark to draw the ID from. 3. When neither exist, we have a slightly more complex problem. The problem for #3 boils down to: 1. As long as the content of the <article> doesn't change, everything works fine. (Assuming we change the SHOULD in the spec to a MUST.) A given consumer will always generate the same ID from the same content, which is exactly the behavior we want. 2. If the content changes, the behavior is no longer good. A given consumer may generate a different ID, though the ideal behavior in this case is that it have the same ID. 3. There is no way to resolve point #2. *Any* content that doesn't have an embedded guid of some kind, whether it be HTML, plain text, or what have you, will be impossible to generate the same ID for when the content changes, because the only "unique" thing about it is the content itself, which has changed. 4. Something that will *not* happen with point #2 is a given HTML page generating different IDs for each article between different runs of the same tool. The requirement (currently SHOULD, should be MUST) ensures that, given the same feed consumer, the generated IDs will be stable for identical content. They can be different between different tools, but I think that's okay. So, possible resolutions? There are two reasonable ones: 1. The guarantee of same-ID-for-same-content is good enough. Authors won't change their pages often enough to make it too annoying, and users will generally receive a given feed with a single tool (or at least, the uniqueness of the IDs only matters within a given tool, and there is no significant communication between different tools). Feed validators can always flag entries that they have to generate an ID for, and warn authors that this may create a suboptimal experience for their readers. 2. The guarantee of same-ID-for-same-content isn't good enough. Any <article> without an #id or a rel=bookmark descendant is incapable of generating an Atom entry. Feed validators can always flag entries that they can't generate an ID for, and warn authors that these articles won't show up for their readers. I believe Sam and Julian are suggesting #2. Hober seems to be suggesting #2 as well. Ian is suggesting #1, along with the suggestion that Atom should be fixed to make this less of a problem. I prefer #1, unless there is evidence that content *does* change often enough in sufficient numbers of feed-producing things to cause a problem. Either one means that, in some situations, readers will have a suboptimal experience. It's simply a question of whether you prefer the risk of spamming readers with near-duplicate entries (with the benefit that more pages will be capable of generating an Atom feed), or the risk of readers missing out on content (less pages capable of generating an Atom feed). Hixie also fears that if we opt for #2, we'll end up with feed consumers inventing proprietary ways to do #1 instead. ~TJ
Received on Thursday, 15 April 2010 22:26:51 UTC