Re: Producing Atom from Maciej Stachowiak on 2009-10-05 (public-html@w3.org from October 2009)

From: Maciej Stachowiak <mjs@apple.com>
Date: Sun, 04 Oct 2009 18:23:12 -0700
To: Sam Ruby <rubys@intertwingly.net>
Cc: Ian Hickson <ian@hixie.ch>, public-html@w3.org
Message-id: <7D78E2FD-4688-440B-A84D-F608FFD26DFA@apple.com>
On Oct 4, 2009, at 5:22 PM, Sam Ruby wrote:

> Maciej Stachowiak wrote:
>> On Oct 4, 2009, at 4:11 PM, Sam Ruby wrote:
>> Here's the algorithm: <http://dev.w3.org/html5/spec/Overview.html#atom 
>> >. It looks to me like it will never add an <atom:author> element,  
>> so its output is always invalid Atom.
>
> Section 2.2.2 may provide an out "extension specification can be  
> written that overrides the requirements in this specification".

Yes, an extension specification could define a different algorithm (or  
a patch to the algorithm) which supersedes HTML5. Then you'd have an  
interoperability problem between implementations that follow it and  
implementations that do not.

>
>>> If it is possible for the conversion algorithm to produce non- 
>>> conforming Atom, then I believe that an informative statement to  
>>> that effect is in order, and ideally in that informative statement  
>>> some guidance should be provided.
>> As written, it's not only possible but necessary.
>
> If you draw that conclusion, and the intent was that implementations  
> may augment this in any way, then this should be clarified.   
> Something as simple as an informative statement would address the  
> issue.

My conclusion is that following the HTML5 algorithm for HTML-to-Atom  
conversion will always produce nonconforming Atom. At least  
superficially, the statement "a user agent must run the following  
algorithm to extract an Atom feed" seems to forbid user agents from  
using any other algorithm to extract an Atom feed.  Perhaps it is the  
intent that other operations that take HTML as input and produce Atom  
as output do not count as "extract[ing] an Atom feed".

>
>>> I have no problem with that statement discouraging vendor-specific  
>>> proprietary extensions, and encouraging vendor neutral extensions  
>>> to this specification where appropriate.
>> Here's some possible options:
>> 1) Leave the HTML-to-Atom algorithm in HTML5, generating Atom that  
>> is always nonconforming.
>> 2) Remove the HTML-to-Atom algorithm from the HTML5 spec (perhaps  
>> it can be in a separate specification).
>> 3) Define the conversion algorithm in HTML5, but have it require  
>> the inclusion of <atom:author> in some way that HTML5 itself does  
>> not specify. Other specifications may fill in the gap, but HTML5  
>> won't reference them.
>> 4) Leave the HTML-to-Atom algorithm in HTML5, generating Atom that  
>> is always nonconforming, but allowing arbitrary additional  
>> information to be added to the Atom output as part of the  
>> conversion. Other specifications may fill in the gap, but HTML5  
>> won't reference them.
>> 5) HTML5 references the vCard vocabulary and specifies how to  
>> include <atom:author> info, thus rendering its default Atom output  
>> conforming.
>> 6) Option 3, but HTML5 does reference the spec that describes how  
>> to include author info.
>> 7) Option 4, but HTML5 does reference the spec that describes how  
>> to include author info.
>> I think #1 is unacceptable, because I believe generating  
>> noncomforming Atom does not satisfy the use cases for HTML-to-Atom  
>> conversion. I I will leave it to others to determine which of the  
>> other options, if any, might be acceptable. In my opinion, all of  
>> the other options are effectively equivalent to either #2 or #5.
>>> Would it be helpful if I opened a bug in bugzilla?
>> It would be useful to be clear about what the bug is, and possible  
>> ways to resolve it.
>
> My point was that #1 is unacceptable.
>
> Other options include moving the Atom production or even all of  
> Microdata out of the HTML5 spec.  (Perhaps those specs could  
> reference Vcard, perhaps not; but either way, those are separate  
> questions).

The HTML-to-Atom conversion, as currently written, has no dependency  
on Microdata, so it would be unaffected by any changes to Microdata.  
It's based purely on built-in elements and attributes of HTML. Moving  
HTML-to-Atom conversion out of the HTML5 spec was my option #2.

> As to which of the remaining options are preferable, I have a  
> preference for options that leave open the possibility that the  
> production of Atom feeds could include information found in the page  
> annotated by either the hAtom microformat or RDFa.

Interestingly, it appears no one has written a specification for how  
to convert hAtom to Atom. And it does not appear that anyone has  
designed an RDFa vocabulary for feeds or a way to do the conversion.  
But it does not make sense for HTML5 to forbid the use of such  
algorithms, either alone or in combination with the current algorithm,  
if they are ever specified.

I believe any of the options listed could leave that possibility open,  
by changing the conformance requirements to define a particular named  
algorithm and not forbidding the use of other algorithms that take  
HTML as input and produce Atom as output. I don't think such a  
requirement makes sense. After all, UAs are allowed to convert HTML to  
a PDF or a PNG in any way they want, and the world hasn't ended.

Regards,
Maciej
Received on Monday, 5 October 2009 01:23:46 UTC