Re: PROV-ISSUE-183 (prov-dm-identifiers): identifiers in prov-dm [prov-dm] from Luc Moreau on 2011-12-06 (public-prov-wg@w3.org from December 2011)

From: Luc Moreau <L.Moreau@ecs.soton.ac.uk>
Date: Tue, 06 Dec 2011 22:33:50 +0000
To: "Myers, Jim" <MYERSJ4@rpi.edu>
CC: Paul Groth <p.t.groth@vu.nl>, Provenance Working Group WG <public-prov-wg@w3.org>
Message-ID: <EMEW3|ffe072db3ba0804cff276eafb6f7e7f2nB5MXv08L.Moreau|ecs.soton.ac.uk|4EDE984E>
Hi Jim,

On 06/12/11 19:27, Myers, Jim wrote:
> Luc,
> I'm confused by the example.
>    

... oops, sorry!
> I would have looked at the document example as a thing in the world (with a DOI, or globally ID'd by the URL where it is posted) that is characterized by two different entities in two different accounts, one which is specific about its font/formatting perhaps and one by the intellectual content. Considering both as one entity seems problematic (what characterization of the document would be different enough to be a different entity?)
>
>    

OK, so we have a document, i.e. a thing in the world, with it's DOI/URL.

My example was mentioning two perspectives about it:
- the techie one (latex/fonts/etc),
- the  intellectual one, talking about its content.

So, yes, we can say that each perspective is an identifiable entity. The 
first is identifiable by the technical details,
and the second by the intellectual content.  While they are identifiable 
by these characteristics, they don't have an identifier
identifying them. The only identifier we have is the DOI/URL.

For each entity, entity records can be created. They can be created by 
the same asserter or different asserters.
Both scenarios are possible.

> The distinction I initially thought you wanted with entity records was that there might be two people who both characterized the doc as a file created by latex, but one of then says it has a font attribute and the other says there are several more formatting attributes. That looks like a case where there's one entity but two entity records that differ in how complete they are. Not having a separate id for these two records seems reasonable - the provenance of both entity records should be the same (modulo the level of detail provided in the accounts, which is very different than the case for two entities that are truly different characterizations of something).
>    

... you mention the provenance of 'entity records'??? This is confusing 
me, now!


> Is this consistent with current definitions? Am I misinterpreting your example?
>
>    
I had not intended to develop a full example by email. For me the 
reference is the prov-dm document.
But, both your scenarios are compatible with the current definition.
But your scenarios do not say how ids are created/scoped/etc.

> An additional comment:
>
> My guess is that a common problem for integrating accounts will be that people considered a thing in the world to be an entity (well enough characterized that they can both create entity records like entity(DOI#, [attributes]) and be OK describing its provenance from their perspectives) and then finding out as in your example that they really had two entities that are different characterizations of the thing with that DOI and should have done something like -  entity(e0,  [characterizationOf=DOI#]) and entity(e1, [characterizationOf=DOI#]) that are complements.  )
>
> I could also see cases where two people thought they had the same entity but their records give a clue - entity(e0, [mass=X]), entity(e0, [weight=Y]) could be the same characterization (both entity(e0, [mass=X, weight=Y]) which works as long as e0 doesn't, for example, go into outer space. If it does, we may find that the two accounts of e0 diverge and the two provenance recorders really had different entities rather than two entity records of one.
>
> Neither of these argues against an entity having an ID but an entity record sharing that ID. It might suggest that we want a standard attribute like 'characterizationOf')...
>    
But who creates the ids e0 and e1?
We can mandate the minting of globally unique e0/e1 ids, but I thought 
there was no support for that.

I am also unclear what the difference is between characterizationOf and 
wasComplementOf.
If we have the former, do we still need the latter?

Luc

> Hope that's helpful and not rehashing old/odd ground.
>
>   Jim
>
>
>    
>> -----Original Message-----
>> From: Luc Moreau [mailto:L.Moreau@ecs.soton.ac.uk]
>> Sent: Tuesday, December 06, 2011 12:05 PM
>> To: Paul Groth
>> Cc: Provenance Working Group WG
>> Subject: Re: PROV-ISSUE-183 (prov-dm-identifiers): identifiers in prov-dm
>> [prov-dm]
>>
>> ... the conclusion issue ;-)
>>
>> No, we have no formal decision on this.
>>
>> We wrote this in the prov-dm document a long time ago (before fpwd), and
>> we have been refining it over time.
>>
>> I think it's an inevitable consequence of two key decisions:
>> - distinguishing entities (in the world) from entity records (in the
>> provenance)
>> - not mandating the minting of new URIs for entity records
>>       (no formal decision on this, but I think we have support for it, since
>>        we want to minimize the effort to generate provenance)
>>
>> Luc
>>
>>
>> On 12/06/2011 04:56 PM, Paul Groth wrote:
>>      
>>> Hi Luc,
>>>
>>> Do you have a pointer to wear we reached the consensus about the dual
>>> role of identifiers?
>>>
>>> Thanks,
>>> Paul
>>>
>>> Provenance Working Group Issue Tracker wrote:
>>>        
>>>> PROV-ISSUE-183 (prov-dm-identifiers): identifiers in prov-dm
>>>> [prov-dm]
>>>>
>>>> http://www.w3.org/2011/prov/track/issues/183
>>>>
>>>> Raised by: Luc Moreau On product: prov-dm
>>>>
>>>>
>>>> Hi,
>>>>
>>>> It think that it is now time to have a proper debate about
>>>> identifiers in prov-dm since comments are regularly expressed about
>>>> them. I have raised this issue about this topic so that we can track
>>>> the conversation properly. Our hope is to reach consensus on this
>>>> topic by the time of the third working draft.
>>>>
>>>> First, in the fpwd, there was a mention of "qualified identifier"
>>>> (appearing in a note see [1]).  We have removed this term from the
>>>> second working draft.
>>>>
>>>> Second, the complementarity record now explicitly allows for linking
>>>> entity records across accounts. Its syntax allows for two accounts to
>>>> be named.
>>>>
>>>> Third, identifiers for entities in prov-dm have a dual role [3]. An
>>>> entity has got an id (typically given by an application). An entity
>>>> record --- i.e. what we say about an entity in a provenance record
>>>> --- also has an id. There is a consensus that we shouldn't mint
>>>> identifiers for provenance records. Hence, the identifier of the
>>>> entity record is defined to be the same as the identifier of the
>>>> entity.
>>>>
>>>> The consequence of this is that two entity records in different
>>>> accounts may have the same identifier: they may say different things
>>>> about the same entity.  For example, the document ex:doc was
>>>> generated by latex in account1, while in account 2, ex:doc is
>>>> described to be the result of a survey of a field by different
>>>> authors.
>>>>
>>>> This explains why we needed the complementarity record to name the
>>>> accounts as well. This assumes that account names need to be named
>>>> uniquely (see [4]).
>>>>
>>>> So, entity records identifiers are scoped to accounts.  Note, I said
>>>> entity *records*, not entities. Hence, we are not breaking the
>>>> semantic web approach: an entity is a resource and is denoted by a
>>>> URI, and this remains true in all accounts. (I guess that from a
>>>> semantic web perspective we are not looking at a provenance record as
>>>> resource, since we don't have a global URI to name it.) Finally, we
>>>> allow for accounts to be nested hierarchically; this fits nicely with
>>>> abstraction in provenance records. Again, see [4].
>>>>
>>>> Can you express your views about this approach, as currently defined
>>>> in the second draft of prov-dm?
>>>>
>>>> Thanks, Luc
>>>>
>>>> [1]
>>>> http://www.w3.org/TR/2011/WD-prov-dm-20111018/#expression-
>>>>          
>> identifier
>>      
>>>> [2]
>>>> http://dvcs.w3.org/hg/prov/raw-
>>>>          
>> file/default/model/ProvenanceModel.htm
>>      
>>>> l#record-complement-of
>>>>
>>>>
>>>>
>>>>          
>>> [3]
>>> http://dvcs.w3.org/hg/prov/raw-
>>>        
>> file/default/model/ProvenanceModel.html
>>      
>>> #record-Entity
>>>
>>>        
>>>> [4]
>>>> http://dvcs.w3.org/hg/prov/raw-
>>>>          
>> file/default/model/ProvenanceModel.htm
>>      
>>>> l#record-Account
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>          
>>>        
>> --
>> Professor Luc Moreau
>> Electronics and Computer Science   tel:   +44 23 8059 4487
>> University of Southampton          fax:   +44 23 8059 2865
>> Southampton SO17 1BJ               email: l.moreau@ecs.soton.ac.uk
>> United Kingdom                     http://www.ecs.soton.ac.uk/~lavm
>>
>>      
>
Received on Tuesday, 6 December 2011 22:40:17 UTC