Re: change proposal for issue-86, was: ISSUE-86 - atom-id-stability - Chairs Solicit Proposals

On 07.04.2010 02:46, Maciej Stachowiak wrote:
>
> Thank you for your submission!
>
> Recorded here: http://dev.w3.org/html5/status/issue-status.html#ISSUE-086
>
> Regards,
> Maciej
> ...

Below is a slightly updated version, based on feedback from the 
atom-syntax mailing list.

-- snip --
Revision 2; taking feedback from the Atom-Syntax mailing list into account.

SUMMARY

The HTML5 spec contains an algorithm for producing an Atom (RFC4287) 
feed document from an HTML page.

The definition both relaxes a MUST-level requirement from RFC4287, but 
also adds a needless restriction.

Also, it's not clear *at all* whether this is a feature that people 
really want, and if they do, whether it needs to be part of HTML5. Given 
the fact that it's non-trivial to generate a valid Atom feed from HTML, 
but the reverse *is* trivial, we should also consider removing this 
feature altogether (I'd be happy to write a 2nd change proposal if 
people want to see that as well). (See [2])

RATIONALE

Instructions to derive a secondary format from HTML documents shouldn't 
be misleading, and also should make clear which conditions need to be 
met to produce valid documents.

DETAILS

There are two problems, both with the following step (4.15.1, step 15.9 
as of April 6):

"Otherwise

     Let id be a user-agent-defined undereferenceable yet globally 
unique valid absolute URL. The same absolute URL should be generated for 
each run of this algorithm when given the same input. Let has-alternate 
be false."

Problem #1: RFC 4287 does not require the ID to be undereferenceable. 
This was a conscious decision of the IETF WG. There's absolutely no 
point in adding this requirement, except for the spec author's distaste 
for URIs that are both dereferenceable *and* act as a globally unique 
and stable identifier.

Furthermore, there's no way to ensure that a URL is "undereferenceable", 
or remains so in the future. As soon as a dereferencing service has been 
written, it's not "undereferenceable" anymore. (See [1]).

Note from 
<http://greenbytes.de/tech/webdav/rfc4287.html#rfc.section.4.2.6.p.2>:

"...Though the IRI might use a dereferencable scheme, Atom Processors 
MUST NOT assume it can be dereferenced."

Problem #2: RFC 4287 makes it a MUST-level requirement to generate the 
same ID every time the feed is regenerated:

 From <http://greenbytes.de/tech/webdav/rfc4287.html#rfc.section.4.2.6.p.3>:

"When an Atom Document is relocated, migrated, syndicated, republished, 
exported, or imported, the content of its atom:id element MUST NOT 
change. Put another way, an atom:id element pertains to all 
instantiations of a particular Atom entry or feed; revisions retain the 
same content in their atom:id elements. It is suggested that the atom:id 
element be stored along with the associated resource."

HTML5 relaxes this to a should-level requirement.

I do agree that generating valid Atom feeds from HTML *is* hard, but 
violating a MUST-level requirement from the Atom spec is not acceptable.

Proposed changes:

For issue #1:

Leave out "undereferencable", changing the sentence to:

"Let id be a user-agent-defined yet globally unique valid absolute URL."

For issue #2:

Change

"The same absolute URL should be generated for each run of this 
algorithm when given the same input."

to

"The same absolute URL must be generated for each run of this algorithm 
when given the same input. If this requirement can not be fulfilled, 
then generating a valid Atom feed is not possible and this algorithm 
should be aborted."


IMPACT

1. Positive Effects

Consistency between the applicable specs. Also, authors are correctly 
informed about what it takes to generate proper Atom feeds.

2. Negative Effects

None.

3. Conformance Classes Changes

Atom feed generators are actually required to generate valid Atom 
documents (with respect to atom:id).

4. Risks

None.

REFERENCES

[1] <http://www.imc.org/atom-syntax/mail-archive/msg21400.html>
[2] <http://www.imc.org/atom-syntax/mail-archive/msg21396.html>

Received on Thursday, 8 April 2010 12:09:39 UTC