- From: Julian Reschke <julian.reschke@gmx.de>
- Date: Tue, 06 Apr 2010 23:12:51 +0200
- To: Maciej Stachowiak <mjs@apple.com>
- CC: "public-html@w3.org WG" <public-html@w3.org>
Hi,
below is a change proposal for this issue.
Note that an obvious alternative to fixing the algorithm would be to 
remove the section completely.
Best regards,
Julian
-- snip --
SUMMARY
The HTML5 spec contains an algorithm for producing an Atom (RFC4287) 
feed document from an HTML page.
The definition both relaxes a MUST-level requirement from RFC4287, but 
also adds a needless restriction.
Also, it's not clear *at all* whether this is a feature that people 
really want, and if they do, whether it needs to be part of HTML5. Given 
the fact that it's non-trivial to generate a valid Atom feed from HTML, 
but the reverse *is* trivial, we should also consider removing this 
feature altogether (I'd be happy to write a 2nd change proposal if 
people want to see that as well).
RATIONALE
Instructions to derive a secondary format from HTML documents shouldn't 
be misleading, and also should make clear which conditions need to be 
met to produce valid documents.
DETAILS
There are two problems, both with the following step (4.15.1, step 15.9 
as of April 6):
"Otherwise
     Let id be a user-agent-defined undereferenceable yet globally 
unique valid absolute URL. The same absolute URL should be generated for 
each run of this algorithm when given the same input. Let has-alternate 
be false."
Problem #1: RFC 4287 does not require the ID to be undereferenceable. 
This was a conscious decision of the IETF AtomPub WG. There's absolutely 
no point in adding this requirement, except for the spec author's 
distaste for URIs that are both dereferenceable *and* act as a globally 
unique and stable identifier.
Note from 
<http://greenbytes.de/tech/webdav/rfc4287.html#rfc.section.4.2.6.p.2>:
"...Though the IRI might use a dereferencable scheme, Atom Processors 
MUST NOT assume it can be dereferenced."
Problem #2: RFC 4287 makes it a MUST-level requirement to generate the 
same ID every time the feed is regenerated:
 From <http://greenbytes.de/tech/webdav/rfc4287.html#rfc.section.4.2.6.p.3>:
"When an Atom Document is relocated, migrated, syndicated, republished, 
exported, or imported, the content of its atom:id element MUST NOT 
change. Put another way, an atom:id element pertains to all 
instantiations of a particular Atom entry or feed; revisions retain the 
same content in their atom:id elements. It is suggested that the atom:id 
element be stored along with the associated resource."
HTML5 relaxes this to a should-level requirement.
I do agree that generating valid Atom feeds from HTML *is* hard, but 
violating a MUST-level requirement from the Atom spec is not acceptable.
Proposed changes:
For issue #1:
Leave out "undereferencable", changing the sentence to:
"Let id be a user-agent-defined yet globally unique valid absolute URL."
For issue #2:
Change
"The same absolute URL should be generated for each run of this 
algorithm when given the same input."
to
"The same absolute URL must be generated for each run of this algorithm 
when given the same input. If this requirement can not be fulfilled, 
then generating a valid Atom feed is not possible and this algorithm 
should be aborted."
IMPACT
1. Positive Effects
Consistency between the applicable specs. Also, authors are correctly 
informed about what it takes to generate proper Atom feeds.
2. Negative Effects
None.
3. Conformance Classes Changes
Atom feed generators are actually required to generate valid Atom 
documents (with respect to atom:id).
4. Risks
None.
REFERENCES
Inline.
Received on Tuesday, 6 April 2010 21:13:27 UTC