- From: Sam Ruby <rubys@intertwingly.net>
- Date: Fri, 16 Apr 2010 07:25:49 -0400
- To: "Tab Atkins Jr." <jackalmage@gmail.com>
- CC: Ian Hickson <ian@hixie.ch>, "Edward O'Connor" <hober0@gmail.com>, Julian Reschke <julian.reschke@gmx.de>, "public-html@w3.org WG" <public-html@w3.org>
On 04/15/2010 08:17 PM, Tab Atkins Jr. wrote: > On Thu, Apr 15, 2010 at 4:29 PM, Ian Hickson<ian@hixie.ch> wrote: >> I'm also not suggesting #1 above. It's quite possible to have multiple >> articles with the same content that aren't the same -- for example, if I >> just post "good night" each night to my blog, and my blog doesn't have >> permalinks or dates or whatnot, then there will be lots of "duplicate" >> articles. > > I'm fine with those generating the same ID, and thus being treated as > duplicates. That's such an edge-case that I can't care about it, and > the natural resolution seems sufficient. > >> Similarly, a blog post with comments might have several "Get >> viagra" comments from "Anonymous Casino", yet they aren't the same. > > A blog post's comments aren't serialized as entries by the Atom > conversion algorithm, as long as the page is formatted correctly (with > an<article> enclosing both the post and the comments). > > Similarly, though, if your page happened to be formatted such that > comments were<article>s without ancestor<article>s, and you had a > number of precisely identical comments, I'm fine with those all being > treated as identical. In fact, that is almost certainly the best > thing to do by default in terms of user experience. > >> I'm suggesting that when it's impossible to come up with a good consistent >> ID, the UA should do its best. It might not be able to, in which case . > > Yeah, sure. Given a deterministic algorithm, though, a feed consumer > *will* be able to generate a consistent ID given the same content. > Thus making the requirement in the spec a MUST should be fine. The > only thing that would prevent this is if it was somehow literally > impossible for the computer to run a deterministic hashing algorithm, > which falls under the "physical limitations" clause that allows UAs to > break any requirement they wish. > >>> Either one means that, in some situations, readers will have a >>> suboptimal experience. >> >> Only if the HTML page is repeatedly converted, but yes, if that happens, >> e.g. if you subscribe to a dynamically updating conversion of an HTML page >> that for some reason (what reason?) can't keep track of previous IDs it >> has used, then you'll end up with a horrible experience. > > If you "subscribe" to a blog's front page, it should be able to > generate an Atom feed that updates as you post new articles (which > appear on the front page and bump older articles off). This requires > repeatedly converting the page. > > However, as long as the content of each<article> hasn't changed, it's > still possible to generate a consistent ID between conversions using > the same tool. That seems sufficient. > > In the case that the article content changes regularly *and* there's > no canonical guid to extract, then it's a horrible user experience, > but that's precisely the case that we think is hopefully uncommon. > >> Indeed. At least if we do a best-effort attempt at outputting content in >> this rare case, users will be able to fix up the feed before using it, if >> necessary. If the data is just dropped on the floor then they'll have far >> more work. > > Agreed. Switching back to co-chair mode.... is there a change proposal that captures these ideas? From my read, this seems very similar to: http://lists.w3.org/Archives/Public/public-html/2010Apr/0193.html Perhaps the change proposal would benefit from the addition of some non-normative descriptions of the (hopefully) relatively uncommon cases where this algorithm will produce suboptimal results, as well as guidance (namely: add id/bookmark) on how best to avoid those cases? > ~TJ - Sam Ruby
Received on Friday, 16 April 2010 11:26:28 UTC