- From: Maciej Stachowiak <mjs@apple.com>
- Date: Wed, 14 Apr 2010 16:20:31 -0700
- To: Sam Ruby <rubys@intertwingly.net>
- Cc: Ian Hickson <ian@hixie.ch>, Julian Reschke <julian.reschke@gmx.de>, "public-html@w3.org WG" <public-html@w3.org>
On Apr 14, 2010, at 2:59 PM, Sam Ruby wrote: > Speaking again as only a member of the Atom community: given the > importance that entry ids have in enabling user agents that process > Atom feeds to match entries in the feed against what they have seen > before, I personally believe that it is actively harmful for feeds > to be produced by any software that can't guarantee that they > produce the same ID given the same input. > > Reading through the entire bug, and in particular comment #23, I > think the spec is making the wrong tradeoff, and I suggest you > reconsider the presumption. I suggest that you actually test out > how common feed aggregators react when they are presented with the > same feed differing only in the entry ids. > > FWIW, I don't see dereferenceable as mandatory. Speaking solely as an amateur spec lawyer: Here is what the Atom spec says about uniqueness of IDs: "When an Atom Document is relocated, migrated, syndicated, republished, exported, or imported, the content of its atom:id element MUST NOT change. Put another way, an atom:id element pertains to all instantiations of a particular Atom entry or feed; revisions retain the same content in their atom:id elements. It is suggested that the atom:id element be stored along with the associated resource." <http://www.ietf.org/rfc/rfc4287.txt> Is converting an HTML file to Atom an example of one of the covered actions that MUST NOT change the atom:id? It doesn't seem like it to me - it's definitely not a relocation, migration, syndication republication or export. Is it an import? It's not clear to me if "import" means from one Atom system to another, or from any non-Atom format. Personally, I would read it as import between Atom publishing systems. If you require import from external formats to give fixed IDs, it gives potentially nonsensical results. For example, creating an Atom post from plain text that does not contain an ID would be nonconforming, since you can't guarantee another Atom system would give the same ID if you imported it there. Or if there was a globally defined algorithm for creating the ID when importing from plain text, it would have to be based on something like a hash of the text, and therefore could not preserve ID across edits of the text. Therefore, it seems to me that "import" can't possibly refer to converting from another format, or it would effectively be nonconforming for anyone to convert from another format ever (except formats that already include unique IDs). But let's assume "import" or one of the other verbs does include conversion from other formats. Applying this to HTML: (1) Should converting the exact same document to Atom multiple times with the same converter give the same atom:ids? That seems like a practical requirement to add, an implementation could just compute a hash of the whole document and append a sequence number for each entry. (2) Should converting the exact same document to Atom multiple times with different converters give the same atom:ids? You could handle this like (1) if the spec defined an exact algorithm for generating Atom IDs, but that would conflict with (3) below. (3) Should converting an edited version of the same document to Atom multiple times with the same converter give the same atom:ids? It seems like this *might* be doable if the converter tracks what documents it has converted before, and has some way to identify that the edited copy is the same. It seems like "edited but the same" is kind of fuzzy though. What if you rename the file, and make extensive edits, how could any system tell it is "the same" without altering the HTML original? But it seems like if you just rename and change nothing else, that should count as "the same", so it can't solely be based on the filename. (4) Should converting an edited version of the same document to Atom multiple times with different converters give the same atom:ids? It seems that this is fundamentally impossible unless the document already has an embedded unique ID per entry. So, if we take the strongest possible interpretation, that the Atom spec requires all of (1)-(4), then any conversion along the lines of what is in the HTML spec fundamentally conflicts with the Atom spec; converting from HTML to Atom would be automatically nonconforming to Atom unless the HTML has embedded globally unique IDs. If that's the case, and we care about Atom conformance If we take a looser interpretation, and, say, only (1) or only (1) and (2) are needed, then we could add requirements to HTML5 which would enforce Atom's requirements. If we take an even more lenient interpretation and say that conversion from a foreign format is not covered by the Atom requirement to preserve IDs, we could leave the spec as-is without conflicting with Atom. Do any of the Atom experts here have an opinion on which of the interpretations is correct? Regards, Maciej
Received on Wednesday, 14 April 2010 23:21:05 UTC