- From: Tab Atkins Jr. <jackalmage@gmail.com>
- Date: Fri, 11 Jun 2010 16:48:40 -0700
On http://www.???.com/blog/2010/06/html5-atom-gone-wrong, a comparison is made between an example Atom feed (presumably constructed from blog metadata) and one constructed by the HTML algorithm reading over the example blog page. Not all of these differences are valid, but some are, and should be fixed in the HTML algo. 1. The HTML algo puts the url for atom:link elements in the content of the <link>. It should be in the @href of the <link>. (Issue 1 in the blog post) 2. The <title> of atom entries is constrained to contain text only, but this "text" can include properly-escaped markup in practice. The HTML algo strips that markup out and just uses the textContent of the appropriate heading. Some practices, such as using a "sarcastic <del>" in a heading, are adversely impacted by this - the meaning of "I <del>don't</del> like HTML" and "I don't like HTML" are completely opposite. The HTML algo should use the escaped innerHTML of the appropriate heading instead. (Issue 3 in the blog post) 3. The HTML algo sets the @type attribute on atom:content to "xml" in some circumstances. It should be "xhtml". (Issue 4 in the blog post) 4. The HTML algo should include an <xml:base> element in the produced feed so that relative links work correctly. Alternately, it should make all links absolute. (Issue 8 in the blog post) 5. I'm not 100% certain on this one, but I think that, in the current step 15.8 of the HTML algo, it should produce a <div> element in the XHTML namespace. The algo currently doesn't specify a namespace for the element. (Issue 5 in the blog post) Issues 2, 6, and 7 in the blog post appear to be a result of the post author either reading the spec incorrectly or writing a bad page to begin with. There are potential problems around Issue 2, but this blog post did not run into them. Issue 9 in the blog post is true, but can't be simply fixed. In most circumstances this won't matter - most blogs are written by a single author. The issues listed in the blog post: 1. The URLs for <link> elements should be stored in the @href attribute and not in the link content. 2. The values used for <id> should be both stable and unique. Using a copy of the permalink meets neither requirement. 3. Stripping the markup from <title> elements has resulted in one title changing its meaning entirely. 4. The @type attribute on <content> elements should be "xhtml" for XHTML content, and not "xml". 5. The XHTML <div> element that is an immediate child of <content> is not correctly namespaced. 6. The dates in the <published> elements are incorrectly formatted and in the wrong timezone. 7. The <updated> elements are merely duplicates of the <published> elements, failing to detect the correct update times. 8. Without an @base attribute, relative URLs inside the <content> elements will not be correctly resolved. 9. The <author> elements are missing altogether since the algorithm is only capable of recognising feed-level authors, at best. ~TJ
Received on Friday, 11 June 2010 16:48:40 UTC