Re: looking for the use case for HTML->Atom conversion

Maciej wrote:
> Would using hAtom be a viable option for you, as the second tool apparently
> does already?

hAtom is great--I'm a big fan, and I use it already. In fact, the
widespread use of hAtom in blog templates is one of the sources of
inspiration for <article> & <time pubdate> in the first place. That
said, the converting-hAtom-to-Atom story is actually worse than the
converting-HTML-to-Atom story.

The hAtom spec[1] doesn't actually define what to use for <atom:id>.[2]
It *does* define something called the Entry Permalink like so:

* an Entry Permalink element is identified by rel-bookmark
* an Entry should have an Entry Permalink
* an Entry Permalink element represents the concept of an Atom link in
  an entry
* if the Entry Permalink is missing, use the URI of the page; if the
  Entry has an "id" attribute, add that as a fragment to the page URI to
  distinguish individual entries

So the Entry Permalink is the equvalent of <atom:link>, not <atom:id>.
Also, note the use of RFC2119 SHOULD and the fallback, for every entry,
to the document URL.

The non-normative hAtom parsing document[3] says to use the Entry
Permalink for <atom:id> as well as for an <atom:link>, and this is
almost what hAtom2Atom implements. If you ran a (valid hAtom) page with
several entries, all of which fail to provide a permalink (and lack
id="") through hAtom2Atom, the resultant <atom:entry>s wouldn't all have
the page's URL for their <atom:id>s, as hAtom specifies. That would be
bad enough, but hAtom2Atom doesn't implement the fallback to the
document URL--it generates empty <atom:id/>s instead, and so produces
invalid Atom. Here's a test case:

Run through hAtom2Atom:

So converting hAtom to Atom with hAtom2Atom suffers from worse Atom
conformance issues than the HTML5 spec's HTML to Atom algorithm. Empty
<atom:id/>s are worse than unstable <atom:id>s in my book. Software that
implemented the Entry Permalink fallback correctly would suffer from a
worse <atom:id> story too, because in the above scenario all of the
distinct <atom:entry>s in the feed would share the same <atom:id>.


2. There are several open hAtom issues related to feed and entry IDs:

Received on Thursday, 15 April 2010 18:01:16 UTC