Re: change proposal for issue-86, was: ISSUE-86 - atom-id-stability - Chairs Solicit Proposals from Maciej Stachowiak on 2010-04-14 (public-html@w3.org from April 2010)

From: Maciej Stachowiak <mjs@apple.com>
Date: Wed, 14 Apr 2010 16:20:31 -0700
To: Sam Ruby <rubys@intertwingly.net>
Cc: Ian Hickson <ian@hixie.ch>, Julian Reschke <julian.reschke@gmx.de>, "public-html@w3.org WG" <public-html@w3.org>
Message-id: <4301F316-0342-49E7-8C72-1BCB6F687C95@apple.com>
On Apr 14, 2010, at 2:59 PM, Sam Ruby wrote:

> Speaking again as only a member of the Atom community: given the  
> importance that entry ids have in enabling user agents that process  
> Atom feeds to match entries in the feed against what they have seen  
> before, I personally believe that it is actively harmful for feeds  
> to be produced by any software that can't guarantee that they  
> produce the same ID given the same input.
>
> Reading through the entire bug, and in particular comment #23, I  
> think the spec is making the wrong tradeoff, and I suggest you  
> reconsider the presumption.  I suggest that you actually test out  
> how common feed aggregators react when they are presented with the  
> same feed differing only in the entry ids.
>
> FWIW, I don't see dereferenceable as mandatory.

Speaking solely as an amateur spec lawyer:

Here is what the Atom spec says about uniqueness of IDs:

"When an Atom Document is relocated, migrated, syndicated,  
republished, exported, or imported, the content of its atom:id element  
MUST NOT change.  Put another way, an atom:id element pertains to all  
instantiations of a particular Atom entry or feed; revisions retain  
the same content in their atom:id elements.  It is suggested that the  
atom:id element be stored along with the associated resource."

<http://www.ietf.org/rfc/rfc4287.txt>

Is converting an HTML file to Atom an example of one of the covered  
actions that MUST NOT change the atom:id? It doesn't seem like it to  
me - it's definitely not a relocation, migration, syndication  
republication or export. Is it an import? It's not clear to me if  
"import" means from one Atom system to another, or from any non-Atom  
format.

Personally, I would read it as import between Atom publishing systems.  
If you require import from external formats to give fixed IDs, it  
gives potentially nonsensical results. For example, creating an Atom  
post from plain text that does not contain an ID would be  
nonconforming, since you can't guarantee another Atom system would  
give the same ID if you imported it there. Or if there was a globally  
defined algorithm for creating the ID when importing from plain text,  
it would have to be based on something like a hash of the text, and  
therefore could not preserve ID across edits of the text.

Therefore, it seems to me that "import" can't possibly refer to  
converting from another format, or it would effectively be  
nonconforming for anyone to convert from another format ever (except  
formats that already include unique IDs).

  But let's assume "import" or one of the other verbs does include  
conversion from other formats. Applying this to HTML:

(1) Should converting the exact same document to Atom multiple times  
with the same converter give the same atom:ids? That seems like a  
practical requirement to add, an implementation could just compute a  
hash of the whole document and append a sequence number for each entry.

(2) Should converting the exact same document to Atom multiple times  
with different converters give the same atom:ids? You could handle  
this like (1) if the spec defined an exact algorithm for generating  
Atom IDs, but that would conflict with (3) below.

(3) Should converting an edited version of the same document to Atom  
multiple times with the same converter give the same atom:ids? It  
seems like this *might* be doable if the converter tracks what  
documents it has converted before, and has some way to identify that  
the edited copy is the same. It seems like "edited but the same" is  
kind of fuzzy though. What if you rename the file, and make extensive  
edits, how could any system tell it is "the same" without altering the  
HTML original? But it seems like if you just rename and change nothing  
else, that should count as "the same", so it can't solely be based on  
the filename.

(4) Should converting an edited version of the same document to Atom  
multiple times with different converters give the same atom:ids? It  
seems that this is fundamentally impossible unless the document  
already has an embedded unique ID per entry.

So, if we take the strongest possible interpretation, that the Atom  
spec requires all of (1)-(4), then any conversion along the lines of  
what is in the HTML spec fundamentally conflicts with the Atom spec;  
converting from HTML to Atom would be automatically nonconforming to  
Atom unless the HTML has embedded globally unique IDs. If that's the  
case, and we care about Atom conformance

If we take a looser interpretation, and, say, only (1) or only (1) and  
(2) are needed, then we could add requirements to HTML5 which would  
enforce Atom's requirements.

If we take an even more lenient interpretation and say that conversion  
from a foreign format is not covered by the Atom requirement to  
preserve IDs, we could leave the spec as-is without conflicting with  
Atom.

Do any of the Atom experts here have an opinion on which of the  
interpretations is correct?

Regards,
Maciej
Received on Wednesday, 14 April 2010 23:21:05 UTC