Re: change proposal for issue-86, was: ISSUE-86 - atom-id-stability - Chairs Solicit Proposals from Maciej Stachowiak on 2010-04-15 (public-html@w3.org from April 2010)

From: Maciej Stachowiak <mjs@apple.com>
Date: Thu, 15 Apr 2010 02:41:41 -0700
To: Julian Reschke <julian.reschke@gmx.de>
Cc: Sam Ruby <rubys@intertwingly.net>, Ian Hickson <ian@hixie.ch>, "public-html@w3.org WG" <public-html@w3.org>
Message-id: <0A0B2623-2964-419D-AF51-F11706875E40@apple.com>
On Apr 15, 2010, at 2:26 AM, Julian Reschke wrote:

> On 15.04.2010 11:10, Maciej Stachowiak wrote:
>>
>> On Apr 15, 2010, at 2:03 AM, Julian Reschke wrote:
>>
>>>
>>>> If you require import from external formats to give fixed IDs, it  
>>>> gives
>>>> potentially nonsensical results. For example, creating an Atom  
>>>> post from
>>>> plain text that does not contain an ID would be nonconforming,  
>>>> since you
>>>> can't guarantee another Atom system would give the same ID if you
>>>> imported it there. Or if there was a globally defined algorithm for
>>>
>>> That would be a problem if you considered the imported piece of text
>>> to be the "same" once it's imported into different feeds.
>>>
>>> But when you do things like that, you usually put that piece of
>>> information into a context. You can't simply assume that anybody  
>>> else
>>> that takes the same text is producing the "same" atom element from  
>>> it.
>>> For instance, given plain text as input, how do you produce titles  
>>> and
>>> timestamps?
>>>
>>> So in general, I would expect somebody importing plain text to  
>>> assign
>>> a unique ID upon import, and keep it.
>>
>> If separate exercises of importing the same plain text are allowed to
>> have different IDs, why not separate exercises of importing the same
>> HTML? I'm asking here purely from the spec technicality point of  
>> view,
>> not the pragmatic point of view. What in the Atom spec distinguishes
>> importing a piece of plain text (possibly even containing multiple
>> entries) from importing a piece of HTML?
>
> I think it would be acceptable if different HTML->Atom converters  
> produce different IDs for the same news entries. But it's not easy  
> to tell without fully understanding what this feature is good for.
>
> The bigger problem is when the *same* converter produces different  
> IDs for the same input on each run, which Ian's text currently sort- 
> of allows ("SHOULD") under certain circumstances (ref BCP14). That  
> would lead to the kind of problems Sam mentioned already.

I'm not talking about different converters in this case though. I'm  
talking about the same converter (or importer if you prefer) run  
multiple times. Based on what you said, it sounds like it would be  
conforming to Atom to make a tool that lets you import the same plain  
text repeatedly but gives it a different atom:id every time. Yet on  
the other hand, you seem to be saying that a tool like this that takes  
HTML as input instead of plain text would be nonconforming to Atom.  
What part of the id stability requirement makes this distinction? It  
seems to me that treating plaintext input and HTML input differently  
is arbitrary and not at all justified by the Atom spec.

>
>>>> (2) Should converting the exact same document to Atom multiple  
>>>> times
>>>> with different converters give the same atom:ids? You could  
>>>> handle this
>>>> like (1) if the spec defined an exact algorithm for generating  
>>>> Atom IDs,
>>>> but that would conflict with (3) below.
>>>
>>> I don't see that as a requirement.
>>
>> What in the Atom spec distinguishes importing multiple times with the
>> same tool, from importing multiple times with different tools? Why  
>> would
>> the latter be exempted from the persistent ID requirement, but not  
>> the
>> former?
>
> As I said before: pulling any data into an Atom feed puts it into a  
> certain context, and requires deriving certain metadata. Requiring  
> *ever* converter to produce the same atom:id essentially means that  
> they need to produce the *same* atom entry (for some value of  
> sameness). In general, that will only be possible if the data you  
> start with already has all the metadata required by Atom, which  
> would include the atom:id.

It looks to me like you didn't answer my question. Let me try again.

Different conversion tools are (apparently) allowed by the Atom spec  
to produce different atom:ids if the input doesn't already contain an  
atom:id. Then why aren't multiple invocations of the same tool allowed  
to do so? Why is it "the same entry" when created without full  
metadata by multiple runs of the same tool, but it isn't "the same  
entry" if you use two different tools? How would you even define "the  
same conversion tool" - is it the same tool if I upgrade from 1.0 to  
1.1? Is it the same if I run it once on my laptop, and then later on  
my desktop?

(If we can't get down to clear answers here, then I'll abandon this  
thread, since we have seem to have a critical mass of agreement for  
removing the feature from the W3C spec anyway. But I am suspicious of  
the flexible way in which the Atom conformance criteria are being  
interpreted.)

Regards,
Maciej
Received on Thursday, 15 April 2010 09:42:19 UTC