W3C home > Mailing lists > Public > public-html@w3.org > April 2010

Re: change proposal for issue-86, was: ISSUE-86 - atom-id-stability - Chairs Solicit Proposals

From: Julian Reschke <julian.reschke@gmx.de>
Date: Thu, 15 Apr 2010 11:26:00 +0200
Message-ID: <4BC6DBA8.40409@gmx.de>
To: Maciej Stachowiak <mjs@apple.com>
CC: Sam Ruby <rubys@intertwingly.net>, Ian Hickson <ian@hixie.ch>, "public-html@w3.org WG" <public-html@w3.org>
On 15.04.2010 11:10, Maciej Stachowiak wrote:
>
> On Apr 15, 2010, at 2:03 AM, Julian Reschke wrote:
>
>>
>>> If you require import from external formats to give fixed IDs, it gives
>>> potentially nonsensical results. For example, creating an Atom post from
>>> plain text that does not contain an ID would be nonconforming, since you
>>> can't guarantee another Atom system would give the same ID if you
>>> imported it there. Or if there was a globally defined algorithm for
>>
>> That would be a problem if you considered the imported piece of text
>> to be the "same" once it's imported into different feeds.
>>
>> But when you do things like that, you usually put that piece of
>> information into a context. You can't simply assume that anybody else
>> that takes the same text is producing the "same" atom element from it.
>> For instance, given plain text as input, how do you produce titles and
>> timestamps?
>>
>> So in general, I would expect somebody importing plain text to assign
>> a unique ID upon import, and keep it.
>
> If separate exercises of importing the same plain text are allowed to
> have different IDs, why not separate exercises of importing the same
> HTML? I'm asking here purely from the spec technicality point of view,
> not the pragmatic point of view. What in the Atom spec distinguishes
> importing a piece of plain text (possibly even containing multiple
> entries) from importing a piece of HTML?

I think it would be acceptable if different HTML->Atom converters 
produce different IDs for the same news entries. But it's not easy to 
tell without fully understanding what this feature is good for.

The bigger problem is when the *same* converter produces different IDs 
for the same input on each run, which Ian's text currently sort-of 
allows ("SHOULD") under certain circumstances (ref BCP14). That would 
lead to the kind of problems Sam mentioned already.

Let me cite Ian again from 
<http://www.w3.org/Bugs/Public/show_bug.cgi?id=7806#c23>:

> It's a must in the Atom WG because (I presume) they assumed that it wouldn't be
> difficult to come up with unique IDs, and thus all software could do it. I do
> not share that assumption. Not following the SHOULD in the HTML5 spec means you
> are violating Atom, which isn't ok, but that's a problem for the Atom spec. I
> do not think that being unable to generate a stable ID should be grounds for
> saying you aren't conforming to HTML, even if you aren't conforming to Atom.

So the intention seems to be that more HTML input documents can be 
converted in an HTML5-conforming way, but some of these may not be 
conforming Atom, and, quoting Ian again, "that's a problem in the Atom 
spec".

Enough said.

>>> (2) Should converting the exact same document to Atom multiple times
>>> with different converters give the same atom:ids? You could handle this
>>> like (1) if the spec defined an exact algorithm for generating Atom IDs,
>>> but that would conflict with (3) below.
>>
>> I don't see that as a requirement.
>
> What in the Atom spec distinguishes importing multiple times with the
> same tool, from importing multiple times with different tools? Why would
> the latter be exempted from the persistent ID requirement, but not the
> former?

As I said before: pulling any data into an Atom feed puts it into a 
certain context, and requires deriving certain metadata. Requiring 
*ever* converter to produce the same atom:id essentially means that they 
need to produce the *same* atom entry (for some value of sameness). In 
general, that will only be possible if the data you start with already 
has all the metadata required by Atom, which would include the atom:id.

Best regards, Julian
Received on Thursday, 15 April 2010 09:26:40 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Wednesday, 9 May 2012 00:17:07 GMT