Re: PROV-ISSUE-183 (prov-dm-identifiers): identifiers in prov-dm [prov-dm]

Hi Jim,

On 12/07/2011 02:30 AM, Myers, Jim wrote:
> Luc,
>
> I'm perhaps mixing things bit my thought was that the DOI belonged to the thing (the doc) and not the entities. If the proposal is to allow multiple entities to reuse the identity of an uncharacterized thing, I think that's a problem. However, if we want to allow assertions that a thing in the world with an ID is already well enough characterized to serve as an entity, I think it would be OK - with the potential existing that different accounts then actually characterize that thing differently without realizing it/ in ways that make it hard to integrate the accounts.
>    
First, I thought that there was not an absolute notion of a thing. It 
was all entities! Turtles all the way down.
It's OK to talk about the doc and its DOI, but in itself it's a 
perspective.

Second, entities are perspectives over things in the world, as observed 
by an observer.  While they are identifiable
(as a minimum in the oberserver's eyes), do they have an identifier 
identifying them? I thought we were moving
away from this.

> Regarding entities versus entity records - I may again be confused about the distinction being made. I thought the issue was differences in attributes being asserted for a common entity. If that's not it, I'm unclear what the difference is. )or perhaps why the entity-record/entity distinction is different than entity/thing distinction.
>
>
>    

The 'entity record' is the record in the database. An entity is part of 
our conceptualization of the world.
Hence, the entity record is the representation, for the purpose of 
provenance, in a computer system, of an entity out there in the world.

I don't want to start again a debate on this. If there are questions, 
they should be about the document.

>> The distinction I initially thought you wanted with entity records was that there might be two people who both characterized the doc as a file created by latex, but one of then says it has a font attribute and the other says there are several more formatting attributes. That looks like a case where there's one entity but two entity records that differ in how complete they are. Not having a separate id for these two records seems reasonable - the provenance of both entity records should be the same (modulo the level of detail provided in the accounts, which is very different than the case for two entities that are truly different characterizations of something).
>>
>>      
> ... you mention the provenance of 'entity records'??? This is confusing
> me, now!
>
> Perhaps I should say the provenance of the one entity documented in the two records? If two records really represent the same entity, the provenance will be the same/consistent (reading ahead, Harvard as established in the 17th century and in one specific year are the same entity...).
>
>    
>> Is this consistent with current definitions? Am I misinterpreting your example?
>>
>>
>>      
> I had not intended to develop a full example by email. For me the
> reference is the prov-dm document.
> But, both your scenarios are compatible with the current definition.
> But your scenarios do not say how ids are created/scoped/etc.
>
> I did not intend to address how IDs are created, but I think they need to be globally usable (directly global or combinable with account scope to create a global ID.)
>
>    
>> An additional comment:
>>
>> My guess is that a common problem for integrating accounts will be that people considered a thing in the world to be an entity (well enough characterized that they can both create entity records like entity(DOI#, [attributes]) and be OK describing its provenance from their perspectives) and then finding out as in your example that they really had two entities that are different characterizations of the thing with that DOI and should have done something like -  entity(e0,  [characterizationOf=DOI#]) and entity(e1, [characterizationOf=DOI#]) that are complements.  )
>>
>> I could also see cases where two people thought they had the same entity but their records give a clue - entity(e0, [mass=X]), entity(e0, [weight=Y]) could be the same characterization (both entity(e0, [mass=X, weight=Y]) which works as long as e0 doesn't, for example, go into outer space. If it does, we may find that the two accounts of e0 diverge and the two provenance recorders really had different entities rather than two entity records of one.
>>
>> Neither of these argues against an entity having an ID but an entity record sharing that ID. It might suggest that we want a standard attribute like 'characterizationOf')...
>>
>>      
> But who creates the ids e0 and e1?
> We can mandate the minting of globally unique e0/e1 ids, but I thought
> there was no support for that.
>
> I am also unclear what the difference is between characterizationOf and
> wasComplementOf.
> If we have the former, do we still need the latter?
>
> I think it is a scope/range issue - complement relates two entities right now, characterizationOf would reference the ID of a thing/something external to our model. I suppose it could be avoided by asserting an entity with the thing's ID that both e0 and e1 would be complements of. Such an entity would not get used for anything else (e.g. the whole point is that the thing can be interpreted as being file-like or intellectual and hence we need e0 and e1 before we can talk about processes that create it, etc.)
>
>
> Again, trying to be helpful and hoping I haven't missed things along the way...
>    

So, what is your concrete proposal regarding prov-dm?

Luc

>   Cheers,
>   Jim
>    

-- 
Professor Luc Moreau
Electronics and Computer Science   tel:   +44 23 8059 4487
University of Southampton          fax:   +44 23 8059 2865
Southampton SO17 1BJ               email: l.moreau@ecs.soton.ac.uk
United Kingdom                     http://www.ecs.soton.ac.uk/~lavm

Received on Wednesday, 7 December 2011 08:34:50 UTC