Re: resolving IDs

Tracker, this is PROV-ISSUE-482

On 20/08/2012 20:14, Jim McCusker wrote:
> If a bundle uses a URI for the ID, then if they use the same URI they are
> talking about the same thing.

While I tend to agree, I think that, as far as the current specifications go, 
this is an assumption rather than stated fact.  To resolve this, I think we need 
to consider how this plays out in representation as RDF (via PROV-O).

Currently, the PROV-O rendering of a bundle in RDF is described using TRiG 
syntax that is not representable directly in the current RDF specification. 
While we anticipate that it will be expressible using the new RDF specifications 
currently being worked upon, it's not yet clear how the semantics of those 
specifications will play out.

I'd suggest that the strongest recommendation we can make at this time is 
something like this:
[[
When provenance information is spread across a number of bundles, the same URI 
SHOULD NOT be used in different bundles to denote different entities, agents or 
activities.  Applications that consume multiple bundles MAY assume that the same 
URI used in different bundles denotes the same entity, agent or activity.
]]

I think this should be stated explicitly somewhere (PROV-DM and PROV-O)

(Note this does not say anything about URIs that do not denote entities, agents 
or activities.  In particular, I'd anticipate that the interpretation of RDF 
properties might differ across bundles, but that's just a hunch at this stage.)

I think we also need to (re)engage with the RDF working group with regard to 
semantics of RDF Datasets.  We want to be confident that provenance that is 
expressed in multiple bundles across data dataset is capable of being 
interpreted in the way we expect without violating the RDF semantics (but not 
necessarily expecting the RDF to provide all the required semantics, as long as 
there is some way to introduce new provenance-specific semantics).

Notes:
http://www.w3.org/TR/prov-dm/#term-identifier
http://www.w3.org/TR/prov-o/#Bundle
http://www4.wiwiss.fu-berlin.de/bizer/TriG/
http://dvcs.w3.org/hg/rdf/raw-file/default/rdf-concepts/index.html#section-dataset


>... If they are using something else, make a
> namespace prefix for the bundle (I prefer to do it based on a content
> digest of the document the bundle is in) and use that prefix to qualify the
> IDs.

That's a possible technique, but not necessarily something we should recommend.

#g
--

> On Mon, Aug 20, 2012 at 3:06 PM, Satrajit Ghosh<satra@mit.edu>  wrote:
>
>> hi all,
>>
>> if one were implementing a database storing prov bundles, would we have to
>> ensure that IDs don't clash in the database insertion code? or is the
>> understanding that IDs are only meant to be unique within a given bundle
>> context?
>>
>> cheers,
>>
>> satra
>>
>>
>
>

Received on Tuesday, 21 August 2012 11:11:39 UTC