Re: change proposal for issue-86, was: ISSUE-86 - atom-id-stability - Chairs Solicit Proposals

On 04/15/2010 08:17 PM, Tab Atkins Jr. wrote:
> On Thu, Apr 15, 2010 at 4:29 PM, Ian Hickson<ian@hixie.ch>  wrote:
>> I'm also not suggesting #1 above. It's quite possible to have multiple
>> articles with the same content that aren't the same -- for example, if I
>> just post "good night" each night to my blog, and my blog doesn't have
>> permalinks or dates or whatnot, then there will be lots of "duplicate"
>> articles.
>
> I'm fine with those generating the same ID, and thus being treated as
> duplicates.  That's such an edge-case that I can't care about it, and
> the natural resolution seems sufficient.
>
>> Similarly, a blog post with comments might have several "Get
>> viagra" comments from "Anonymous Casino", yet they aren't the same.
>
> A blog post's comments aren't serialized as entries by the Atom
> conversion algorithm, as long as the page is formatted correctly (with
> an<article>  enclosing both the post and the comments).
>
> Similarly, though, if your page happened to be formatted such that
> comments were<article>s without ancestor<article>s, and you had a
> number of precisely identical comments, I'm fine with those all being
> treated as identical.  In fact, that is almost certainly the best
> thing to do by default in terms of user experience.
>
>> I'm suggesting that when it's impossible to come up with a good consistent
>> ID, the UA should do its best. It might not be able to, in which case .
>
> Yeah, sure.  Given a deterministic algorithm, though, a feed consumer
> *will* be able to generate a consistent ID given the same content.
> Thus making the requirement in the spec a MUST should be fine.  The
> only thing that would prevent this is if it was somehow literally
> impossible for the computer to run a deterministic hashing algorithm,
> which falls under the "physical limitations" clause that allows UAs to
> break any requirement they wish.
>
>>> Either one means that, in some situations, readers will have a
>>> suboptimal experience.
>>
>> Only if the HTML page is repeatedly converted, but yes, if that happens,
>> e.g. if you subscribe to a dynamically updating conversion of an HTML page
>> that for some reason (what reason?) can't keep track of previous IDs it
>> has used, then you'll end up with a horrible experience.
>
> If you "subscribe" to a blog's front page, it should be able to
> generate an Atom feed that updates as you post new articles (which
> appear on the front page and bump older articles off).  This requires
> repeatedly converting the page.
>
> However, as long as the content of each<article>  hasn't changed, it's
> still possible to generate a consistent ID between conversions using
> the same tool.  That seems sufficient.
>
> In the case that the article content changes regularly *and* there's
> no canonical guid to extract, then it's a horrible user experience,
> but that's precisely the case that we think is hopefully uncommon.
>
>> Indeed. At least if we do a best-effort attempt at outputting content in
>> this rare case, users will be able to fix up the feed before using it, if
>> necessary. If the data is just dropped on the floor then they'll have far
>> more work.
>
> Agreed.

Switching back to co-chair mode.... is there a change proposal that 
captures these ideas?  From my read, this seems very similar to:

http://lists.w3.org/Archives/Public/public-html/2010Apr/0193.html

Perhaps the change proposal would benefit from the addition of some 
non-normative descriptions of the (hopefully) relatively uncommon cases 
where this algorithm will produce suboptimal results, as well as 
guidance (namely: add id/bookmark) on how best to avoid those cases?

> ~TJ

- Sam Ruby

Received on Friday, 16 April 2010 11:26:28 UTC