Re: ISSUE-385: hasProvenanceIn: finding a solution

From: Graham Klyne <GK@ninebynine.org>
Date: Sun, 03 Jun 2012 07:48:53 +0100
Message-ID: <4FCB08D5.7010303@ninebynine.org>
To: Jun Zhao <jun.zhao@zoo.ox.ac.uk>
CC: Paul Groth <p.t.groth@vu.nl>, Luc Moreau <L.Moreau@ecs.soton.ac.uk>, "public-prov-wg@w3.org" <public-prov-wg@w3.org>
(I'm replying arbitrarily to Jun's email to maintain the thread, but my comment 
is to the issue in general.  As it happens, my point about semantics is 
underscored by Jun's comment about time constraint - I think it's a non-issue 
here, but not obviously so.)

I think the problem we're running into is that we agreed at the last F2F to 
remove all the additional semantics associated with account.  Thus, to 
paraphrase Simon's excellent summary, a bundle is just a named set of provenance 
statements without any further semantics.  But it appears that Luc's example 
needs more semantics than just a named set of provenance statements - and that's 
where I think we are running into problems, because we are not clear about 
exactly what those additional semantics should be.

Therefore I suggest that, according to prior WG agreement, Luc's example is out 
of scope for us to fully resolve.  Paul's suggestion to provide the attributes 
as an extensibility hook is one possible approach.

Another possible and more radical approach, prompted by Tim's earlier suggestion 
to take a local name from DC, is to drop hasProvenanceIn entirely from the prov 
specification, and (in the usage guidelines) document use the DC term for this 
purpose.  This will leave the field clear for subsequent work to define a 
suitable cross-bundle primitive when we have a clearer common understanding of 
the actual requirements.

I summary, options that work for me would be (in order of preference):
(1) drop hasProvenanceIn entirely and move on.  Use existing terms from other 
vocabularies to express this idea. (**)
(2) adopt Paul's suggestion of an extensible 2-place relation (*)

(*) noting the importance of monotonicity here: extension attributes must not be 
able to change semantics of the underlying property.  If the underlyong property 
has no (formal) semantics, this is easy.  If the underlying property does have 
built-in semantics, then the utility of the extension may be limited (or worse, 
careless extensions may break the underlying semantic model associated with the 
core provenance model).

(**) the slight inconsistency here would be that PROV-AQ still requires a 
prov:hasProvenance relation.  I'm OK with this because PROV-AQ is intended to 
address operational concerns where the model is not.  But this does create a 
reasonably compelling argument for having a corresponding relation in the model 
- if the semantics are minimal then the same relation can work at both levels.


On 02/06/2012 22:36, Jun Zhao wrote:
> Paul,
> At first sight, I loved your proposal. But after reading into it, I got less sure.
> This property is to allow locating the bundle in which the provenance of an entity is described. To qualify this, would it mean that, e.g, there is a time period during which you can find provenance of that entity in the bundle and after that you can't?
> Although the pattern you propose makes sense, I can't see when people need to qualify this relation. If you have a more concrete example in mind, I am ready to be convinced!
> Cheers,
> Jun
> Sent from my iPad (sorry for the brevity)
> On 1 Jun 2012, at 17:03, Paul Groth<p.t.groth@vu.nl>  wrote:
>> Hi All,
>> It seems that a one approach would be to define an extensible version
>> of hasProvenanceIn and leave it at that.
>> hasProvenanceIn(id, entity, bundle, attrs).
>> Like all our extensible relations, we would also have the straight
>> binary version
>> hasProvenanceIn(entity,bundle)
>> This would allow for the extensibility to cater for Luc's use case but
>> also for other use cases where extension is nice. For example, I can
>> imagine a system wanting to put a time constraint on the applicability
>> of provenance in a bundle to an entity.
>> This would leave it up to people to define specialization, alternate
>> and derivation relations between entities as they want.
>> Would this be acceptable to the group?
>> Thanks
>> Paul
>> On Fri, Jun 1, 2012 at 5:33 PM, Luc Moreau<L.Moreau@ecs.soton.ac.uk>  wrote:
>>> Hi Simon,
>>> Thanks for your message. I feel you don't directly respond to the points
>>> that I raised,
>>> and therefore all my comments stand.
>>> I respond to your points below.
>>> On 06/01/2012 03:39 PM, Miles, Simon wrote:
>>>> Hi Luc,
>>>> I will try to articulate the points which I think back up the binary relations proposal.
>>>> 1. As I understood, there is currently no semantics to a bundle. A querier can choose to consider the descriptions in the bundle or not (based on the bundle's provenance), but whether there are one or many bundles, the querier just has a set of PROV descriptions. The bundles need to be found and known to be relevant, which is why hasProvenanceIn (or isTopicOf) is needed. After that, which bundle a description is in is irrelevant and the bundling can be ignored. A specific extension of PROV may change this by adding semantics to bundles, but this is not in the current specification.
>>> A close notion to bundle in prior provenance art is opm:Account, and
>>> there is plenty of evidence
>>> that merging accounts may lead to contradictions.  PROV, rightly so,
>>> does not define a union operator
>>> over bundles, and is silent about merging or not bundles.
>>> Therefore,  there is nothing in PROV that backs this statement "which
>>> bundle a description is in is
>>> irrelevant and the bundling can be ignored".
>>> You are suggesting that an extension of PROV may add semantics to
>>> bundles: that's exactly what you
>>> have done, by implying they are mergeable.
>>>> Taking the statements from the three bundles below, a querier would end up with:
>>>>   activity(ex:a1, 2011-11-16T16:00:00,2011-11-16T17:0:00)
>>>>   wasAssociatedWith(ex:a1,ex:Bob,[prov:role="controller"])
>>>>   activity(ex:a2, 2011-11-17T10:00:00,2011-11-17T17:0:00)
>>>>   wasAssociatedWith(ex:a2,ex:Bob,[prov:role="controller"])
>>>>   agent(tool:Bob1, [perf:rating="good"])
>>>>   agent(tool:Bob2, [perf:rating="bad"])
>>>> I can see nothing in the current specification to suggest this means anything different to when these descriptions are separated into multiple bundles. Do you agree?
>>> PROV does not specify whether they mean something different or not.
>>>> 2. If there are two entity identifiers relating to the same thing/entity, we need to say how they are connected: either alternateOf, specializationOf, or possibly some external relation such as owl:sameAs. While the example below happens to imply a specialisation relation between tool:Bob1 and ex:Bob, there is no reason to believe this is true in all cases: alternateOf is just as possible. So, hasProvenanceIn cannot imply or be a sub-type of either specializationOf or alternateOf, the appropriate one must be asserted separately.
>>> I agree that being able to assert subtypes for hasProvenanceIn is
>>> important: that why I am
>>> in favour of having hasProvenanceIn a n-ary relation that includes
>>> attributes so that prov:type can be
>>> used for what you suggest.
>>>> 3. The same thing described from different perspectives has multiple identifiers regardless of bundles, i.e. at least one for each entity. When a bundle is newly read by a querier interested in the provenance of entity E, they should consider every entity E is a specialisation of, and look for those identifiers as well. If they don't, they will miss information about the provenance of E described at a coarser granularity.
>>>> For example, ex:Bob may be a specialisation of ex:GeneralBob, and bundle ex:run1 might describe something about ex:GeneralBob's provenance. This makes "hasProvenanceIn(tool:Bob1, ex:run1, ex:Bob)" strange, because it is not only ex:Bob that is relevant to look for in ex:run1.
>>>> Separating concerns, I'd argue it is preferable to say:
>>>>   hasProvenanceIn(tool:Bob1, ex:run1)
>>>>   specializationOf(tool:Bob1, ex:Bob)
>>>>   specializationOf(tool:Bob, ex:GeneralBob)
>>> But this latter statement would belong to the ex:run1 bundle I assume.
>>> It is not going to be known to be relevant to me until I have correctly
>>> been able to link tool:Bob1 to ex:Bob in run1.
>>>> and let the que
