Re: change proposal for issue-86, was: ISSUE-86 - atom-id-stability - Chairs Solicit Proposals from Tab Atkins Jr. on 2010-04-16 (public-html@w3.org from April 2010)

From: Tab Atkins Jr. <jackalmage@gmail.com>
Date: Thu, 15 Apr 2010 17:17:43 -0700
To: Ian Hickson <ian@hixie.ch>
Cc: "Edward O'Connor" <hober0@gmail.com>, Sam Ruby <rubys@intertwingly.net>, Julian Reschke <julian.reschke@gmx.de>, "public-html@w3.org WG" <public-html@w3.org>
Message-ID: <k2mdd0fbad1004151717z352f353aq46c9af16038fd361@mail.gmail.com>

On Thu, Apr 15, 2010 at 4:29 PM, Ian Hickson <ian@hixie.ch> wrote:
> I'm also not suggesting #1 above. It's quite possible to have multiple
> articles with the same content that aren't the same -- for example, if I
> just post "good night" each night to my blog, and my blog doesn't have
> permalinks or dates or whatnot, then there will be lots of "duplicate"
> articles.

I'm fine with those generating the same ID, and thus being treated as
duplicates.  That's such an edge-case that I can't care about it, and
the natural resolution seems sufficient.

> Similarly, a blog post with comments might have several "Get
> viagra" comments from "Anonymous Casino", yet they aren't the same.

A blog post's comments aren't serialized as entries by the Atom
conversion algorithm, as long as the page is formatted correctly (with
an <article> enclosing both the post and the comments).

Similarly, though, if your page happened to be formatted such that
comments were <article>s without ancestor <article>s, and you had a
number of precisely identical comments, I'm fine with those all being
treated as identical.  In fact, that is almost certainly the best
thing to do by default in terms of user experience.

> I'm suggesting that when it's impossible to come up with a good consistent
> ID, the UA should do its best. It might not be able to, in which case .

Yeah, sure.  Given a deterministic algorithm, though, a feed consumer
*will* be able to generate a consistent ID given the same content.
Thus making the requirement in the spec a MUST should be fine.  The
only thing that would prevent this is if it was somehow literally
impossible for the computer to run a deterministic hashing algorithm,
which falls under the "physical limitations" clause that allows UAs to
break any requirement they wish.

>> Either one means that, in some situations, readers will have a
>> suboptimal experience.
>
> Only if the HTML page is repeatedly converted, but yes, if that happens,
> e.g. if you subscribe to a dynamically updating conversion of an HTML page
> that for some reason (what reason?) can't keep track of previous IDs it
> has used, then you'll end up with a horrible experience.

If you "subscribe" to a blog's front page, it should be able to
generate an Atom feed that updates as you post new articles (which
appear on the front page and bump older articles off).  This requires
repeatedly converting the page.

However, as long as the content of each <article> hasn't changed, it's
still possible to generate a consistent ID between conversions using
the same tool.  That seems sufficient.

In the case that the article content changes regularly *and* there's
no canonical guid to extract, then it's a horrible user experience,
but that's precisely the case that we think is hopefully uncommon.

> Indeed. At least if we do a best-effort attempt at outputting content in
> this rare case, users will be able to fix up the feed before using it, if
> necessary. If the data is just dropped on the floor then they'll have far
> more work.

Agreed.

~TJ

Received on Friday, 16 April 2010 00:18:30 UTC