Re: ISSUE-385: hasProvenanceIn: finding a solution

On Jun 3, 2012, at 12:40 PM, Paul Groth wrote:

> Hi Graham,
> 
> I would argue that being able to refer to a bundle in which the
> provenance of an entity is contained is an important piece of
> functionality to allow people to easily organize their provenance
> information.

In light of Graham's F2F2 reminder about brevity due to the timeline, perhaps providing a single unqualified / qualified construct with minimal guidance but broad description would get us what we need.
I am not in favor of a qualified form (in the same way that we do not have them for alternateOf), but if it gets us to consensus, then I won't object to its inclusion.

Regarding Graham's suggestion that we "leave it to dcterms:isReferencedBy", I'd suggest that we at least provide a property that narrows the range to prov:Bundle:

prov:isReferencedBy rdfs:range prov:Bundle .

Then call it a Rec. :-)

> 
> I can see the point about trying to reuse the relation between the PAQ
> and the dm.

Unfortunately, I'm behind on the PAQ. But perhaps it's become required reading for the hasProvenanceIn decision…

-Tim

> 
> cheers
> Paul
> 
> 
> On Sun, Jun 3, 2012 at 9:48 AM, Graham Klyne <GK@ninebynine.org> wrote:
>> (I'm replying arbitrarily to Jun's email to maintain the thread, but my comment
>> is to the issue in general.  As it happens, my point about semantics is
>> underscored by Jun's comment about time constraint - I think it's a non-issue
>> here, but not obviously so.)
>> 
>> I think the problem we're running into is that we agreed at the last F2F to
>> remove all the additional semantics associated with account.  Thus, to
>> paraphrase Simon's excellent summary, a bundle is just a named set of provenance
>> statements without any further semantics.  But it appears that Luc's example
>> needs more semantics than just a named set of provenance statements - and that's
>> where I think we are running into problems, because we are not clear about
>> exactly what those additional semantics should be.
>> 
>> Therefore I suggest that, according to prior WG agreement, Luc's example is out
>> of scope for us to fully resolve.  Paul's suggestion to provide the attributes
>> as an extensibility hook is one possible approach.
>> 
>> Another possible and more radical approach, prompted by Tim's earlier suggestion
>> to take a local name from DC, is to drop hasProvenanceIn entirely from the prov
>> specification, and (in the usage guidelines) document use the DC term for this
>> purpose.  This will leave the field clear for subsequent work to define a
>> suitable cross-bundle primitive when we have a clearer common understanding of
>> the actual requirements.
>> 
>> I summary, options that work for me would be (in order of preference):
>> (1) drop hasProvenanceIn entirely and move on.  Use existing terms from other
>> vocabularies to express this idea. (**)
>> (2) adopt Paul's suggestion of an extensible 2-place relation (*)
>> 
>> (*) noting the importance of monotonicity here: extension attributes must not be
>> able to change semantics of the underlying property.  If the underlyong property
>> has no (formal) semantics, this is easy.  If the underlying property does have
>> built-in semantics, then the utility of the extension may be limited (or worse,
>> careless extensions may break the underlying semantic model associated with the
>> core provenance model).
>> 
>> (**) the slight inconsistency here would be that PROV-AQ still requires a
>> prov:hasProvenance relation.  I'm OK with this because PROV-AQ is intended to
>> address operational concerns where the model is not.  But this does create a
>> reasonably compelling argument for having a corresponding relation in the model
>> - if the semantics are minimal then the same relation can work at both levels.
>> 
>> #g
>> --
>> 
>> 
>> On 02/06/2012 22:36, Jun Zhao wrote:
>>> Paul,
>>> 
>>> At first sight, I loved your proposal. But after reading into it, I got less sure.
>>> 
>>> This property is to allow locating the bundle in which the provenance of an entity is described. To qualify this, would it mean that, e.g, there is a time period during which you can find provenance of that entity in the bundle and after that you can't?
>>> 
>>> Although the pattern you propose makes sense, I can't see when people need to qualify this relation. If you have a more concrete example in mind, I am ready to be convinced!
>>> 
>>> Cheers,
>>> 
>>> Jun
>>> 
>>> Sent from my iPad (sorry for the brevity)
>>> 
>>> On 1 Jun 2012, at 17:03, Paul Groth<p.t.groth@vu.nl>  wrote:
>>> 
>>>> Hi All,
>>>> 
>>>> It seems that a one approach would be to define an extensible version
>>>> of hasProvenanceIn and leave it at that.
>>>> 
>>>> hasProvenanceIn(id, entity, bundle, attrs).
>>>> 
>>>> Like all our extensible relations, we would also have the straight
>>>> binary version
>>>> 
>>>> hasProvenanceIn(entity,bundle)
>>>> 
>>>> This would allow for the extensibility to cater for Luc's use case but
>>>> also for other use cases where extension is nice. For example, I can
>>>> imagine a system wanting to put a time constraint on the applicability
>>>> of provenance in a bundle to an entity.
>>>> 
>>>> This would leave it up to people to define specialization, alternate
>>>> and derivation relations between entities as they want.
>>>> 
>>>> Would this be acceptable to the group?
>>>> 
>>>> Thanks
>>>> Paul
>>>> 
>>>> 
>>>> 
>>>> On Fri, Jun 1, 2012 at 5:33 PM, Luc Moreau<L.Moreau@ecs.soton.ac.uk>  wrote:
>>>>> Hi Simon,
>>>>> 
>>>>> Thanks for your message. I feel you don't directly respond to the points
>>>>> that I raised,
>>>>> and therefore all my comments stand.
>>>>> 
>>>>> I respond to your points below.
>>>>> 
>>>>> On 06/01/2012 03:39 PM, Miles, Simon wrote:
>>>>>> Hi Luc,
>>>>>> 
>>>>>> I will try to articulate the points which I think back up the binary relations proposal.
>>>>>> 
>>>>>> 1. As I understood, there is currently no semantics to a bundle. A querier can choose to consider the descriptions in the bundle or not (based on the bundle's provenance), but whether there are one or many bundles, the querier just has a set of PROV descriptions. The bundles need to be found and known to be relevant, which is why hasProvenanceIn (or isTopicOf) is needed. After that, which bundle a description is in is irrelevant and the bundling can be ignored. A specific extension of PROV may change this by adding semantics to bundles, but this is not in the current specification.
>>>>>> 
>>>>>> 
>>>>> 
>>>>> A close notion to bundle in prior provenance art is opm:Account, and
>>>>> there is plenty of evidence
>>>>> that merging accounts may lead to contradictions.  PROV, rightly so,
>>>>> does not define a union operator
>>>>> over bundles, and is silent about merging or not bundles.
>>>>> 
>>>>> Therefore,  there is nothing in PROV that backs this statement "which
>>>>> bundle a description is in is
>>>>> irrelevant and the bundling can be ignored".
>>>>> 
>>>>> You are suggesting that an extension of PROV may add semantics to
>>>>> bundles: that's exactly what you
>>>>> have done, by implying they are mergeable.
>>>>> 
>>>>>> Taking the statements from the three bundles below, a querier would end up with:
>>>>>> 
>>>>>>   activity(ex:a1, 2011-11-16T16:00:00,2011-11-16T17:0:00)
>>>>>>   wasAssociatedWith(ex:a1,ex:Bob,[prov:role="controller"])
>>>>>>   activity(ex:a2, 2011-11-17T10:00:00,2011-11-17T17:0:00)
>>>>>>   wasAssociatedWith(ex:a2,ex:Bob,[prov:role="controller"])
>>>>>>   agent(tool:Bob1, [perf:rating="good"])
>>>>>>   agent(tool:Bob2, [perf:rating="bad"])
>>>>>> 
>>>>>> I can see nothing in the current specification to suggest this means anything different to when these descriptions are separated into multiple bundles. Do you agree?
>>>>>> 
>>>>>> 
>>>>> 
>>>>> PROV does not specify whether they mean something different or not.
>>>>> 
>>>>>> 2. If there are two entity identifiers relating to the same thing/entity, we need to say how they are connected: either alternateOf, specializationOf, or possibly some external relation such as owl:sameAs. While the example below happens to imply a specialisation relation between tool:Bob1 and ex:Bob, there is no reason to believe this is true in all cases: alternateOf is just as possible. So, hasProvenanceIn cannot imply or be a sub-type of either specializationOf or alternateOf, the appropriate one must be asserted separately.
>>>>>> 
>>>>> 
>>>>> I agree that being able to assert subtypes for hasProvenanceIn is
>>>>> important: that why I am
>>>>> in favour of having hasProvenanceIn a n-ary relation that includes
>>>>> attributes so that prov:type can be
>>>>> used for what you suggest.
>>>>>> 3. The same thing described from different perspectives has multiple identifiers regardless of bundles, i.e. at least one for each entity. When a bundle is newly read by a querier interested in the provenance of entity E, they should consider every entity E is a specialisation of, and look for those identifiers as well. If they don't, they will miss information about the provenance of E described at a coarser granularity.
>>>>>> 
>>>>>> For example, ex:Bob may be a specialisation of ex:GeneralBob, and bundle ex:run1 might describe something about ex:GeneralBob's provenance. This makes "hasProvenanceIn(tool:Bob1, ex:run1, ex:Bob)" strange, because it is not only ex:Bob that is relevant to look for in ex:run1.
>>>>>> 
>>>>>> Separating concerns, I'd argue it is preferable to say:
>>>>>>   hasProvenanceIn(tool:Bob1, ex:run1)
>>>>>>   specializationOf(tool:Bob1, ex:Bob)
>>>>>>   specializationOf(tool:Bob, ex:GeneralBob)
>>>>>> 
>>>>> But this latter statement would belong to the ex:run1 bundle I assume.
>>>>> It is not going to be known to be relevant to me until I have correctly
>>>>> been able to link tool:Bob1 to ex:Bob in run1.
>>>>> 
>>>>> 
>>>>>> and let the que
>>> 
>>> 
> 
> 

Received on Monday, 4 June 2012 02:17:01 UTC