- From: Tab Atkins Jr. <jackalmage@gmail.com>
- Date: Thu, 15 Apr 2010 17:17:43 -0700
- To: Ian Hickson <ian@hixie.ch>
- Cc: "Edward O'Connor" <hober0@gmail.com>, Sam Ruby <rubys@intertwingly.net>, Julian Reschke <julian.reschke@gmx.de>, "public-html@w3.org WG" <public-html@w3.org>
On Thu, Apr 15, 2010 at 4:29 PM, Ian Hickson <ian@hixie.ch> wrote: > I'm also not suggesting #1 above. It's quite possible to have multiple > articles with the same content that aren't the same -- for example, if I > just post "good night" each night to my blog, and my blog doesn't have > permalinks or dates or whatnot, then there will be lots of "duplicate" > articles. I'm fine with those generating the same ID, and thus being treated as duplicates. That's such an edge-case that I can't care about it, and the natural resolution seems sufficient. > Similarly, a blog post with comments might have several "Get > viagra" comments from "Anonymous Casino", yet they aren't the same. A blog post's comments aren't serialized as entries by the Atom conversion algorithm, as long as the page is formatted correctly (with an <article> enclosing both the post and the comments). Similarly, though, if your page happened to be formatted such that comments were <article>s without ancestor <article>s, and you had a number of precisely identical comments, I'm fine with those all being treated as identical. In fact, that is almost certainly the best thing to do by default in terms of user experience. > I'm suggesting that when it's impossible to come up with a good consistent > ID, the UA should do its best. It might not be able to, in which case . Yeah, sure. Given a deterministic algorithm, though, a feed consumer *will* be able to generate a consistent ID given the same content. Thus making the requirement in the spec a MUST should be fine. The only thing that would prevent this is if it was somehow literally impossible for the computer to run a deterministic hashing algorithm, which falls under the "physical limitations" clause that allows UAs to break any requirement they wish. >> Either one means that, in some situations, readers will have a >> suboptimal experience. > > Only if the HTML page is repeatedly converted, but yes, if that happens, > e.g. if you subscribe to a dynamically updating conversion of an HTML page > that for some reason (what reason?) can't keep track of previous IDs it > has used, then you'll end up with a horrible experience. If you "subscribe" to a blog's front page, it should be able to generate an Atom feed that updates as you post new articles (which appear on the front page and bump older articles off). This requires repeatedly converting the page. However, as long as the content of each <article> hasn't changed, it's still possible to generate a consistent ID between conversions using the same tool. That seems sufficient. In the case that the article content changes regularly *and* there's no canonical guid to extract, then it's a horrible user experience, but that's precisely the case that we think is hopefully uncommon. > Indeed. At least if we do a best-effort attempt at outputting content in > this rare case, users will be able to fix up the feed before using it, if > necessary. If the data is just dropped on the floor then they'll have far > more work. Agreed. ~TJ
Received on Friday, 16 April 2010 00:18:30 UTC