RE: PROV-ISSUE-183 (prov-dm-identifiers): identifiers in prov-dm [prov-dm]

Luc,

I'm perhaps mixing things bit my thought was that the DOI belonged to the thing (the doc) and not the entities. If the proposal is to allow multiple entities to reuse the identity of an uncharacterized thing, I think that's a problem. However, if we want to allow assertions that a thing in the world with an ID is already well enough characterized to serve as an entity, I think it would be OK - with the potential existing that different accounts then actually characterize that thing differently without realizing it/ in ways that make it hard to integrate the accounts.

Regarding entities versus entity records - I may again be confused about the distinction being made. I thought the issue was differences in attributes being asserted for a common entity. If that's not it, I'm unclear what the difference is. )or perhaps why the entity-record/entity distinction is different than entity/thing distinction.


> The distinction I initially thought you wanted with entity records was that there might be two people who both characterized the doc as a file created by latex, but one of then says it has a font attribute and the other says there are several more formatting attributes. That looks like a case where there's one entity but two entity records that differ in how complete they are. Not having a separate id for these two records seems reasonable - the provenance of both entity records should be the same (modulo the level of detail provided in the accounts, which is very different than the case for two entities that are truly different characterizations of something).
>

... you mention the provenance of 'entity records'??? This is confusing
me, now!

Perhaps I should say the provenance of the one entity documented in the two records? If two records really represent the same entity, the provenance will be the same/consistent (reading ahead, Harvard as established in the 17th century and in one specific year are the same entity...).

> Is this consistent with current definitions? Am I misinterpreting your example?
>
>
I had not intended to develop a full example by email. For me the
reference is the prov-dm document.
But, both your scenarios are compatible with the current definition.
But your scenarios do not say how ids are created/scoped/etc.

I did not intend to address how IDs are created, but I think they need to be globally usable (directly global or combinable with account scope to create a global ID.)

> An additional comment:
>
> My guess is that a common problem for integrating accounts will be that people considered a thing in the world to be an entity (well enough characterized that they can both create entity records like entity(DOI#, [attributes]) and be OK describing its provenance from their perspectives) and then finding out as in your example that they really had two entities that are different characterizations of the thing with that DOI and should have done something like -  entity(e0,  [characterizationOf=DOI#]) and entity(e1, [characterizationOf=DOI#]) that are complements.  )
>
> I could also see cases where two people thought they had the same entity but their records give a clue - entity(e0, [mass=X]), entity(e0, [weight=Y]) could be the same characterization (both entity(e0, [mass=X, weight=Y]) which works as long as e0 doesn't, for example, go into outer space. If it does, we may find that the two accounts of e0 diverge and the two provenance recorders really had different entities rather than two entity records of one.
>
> Neither of these argues against an entity having an ID but an entity record sharing that ID. It might suggest that we want a standard attribute like 'characterizationOf')...
>
But who creates the ids e0 and e1?
We can mandate the minting of globally unique e0/e1 ids, but I thought
there was no support for that.

I am also unclear what the difference is between characterizationOf and
wasComplementOf.
If we have the former, do we still need the latter?

I think it is a scope/range issue - complement relates two entities right now, characterizationOf would reference the ID of a thing/something external to our model. I suppose it could be avoided by asserting an entity with the thing's ID that both e0 and e1 would be complements of. Such an entity would not get used for anything else (e.g. the whole point is that the thing can be interpreted as being file-like or intellectual and hence we need e0 and e1 before we can talk about processes that create it, etc.)


Again, trying to be helpful and hoping I haven't missed things along the way...

 Cheers,
 Jim

Received on Wednesday, 7 December 2011 02:30:48 UTC