Re: ISSUE-385: hasProvenanceIn: finding a solution from Paul Groth on 2012-06-03 (public-prov-wg@w3.org from June 2012)

From: Paul Groth <p.t.groth@vu.nl>
Date: Sun, 3 Jun 2012 19:43:45 +0300
To: Jun Zhao <jun.zhao@zoo.ox.ac.uk>
Cc: Luc Moreau <L.Moreau@ecs.soton.ac.uk>, "public-prov-wg@w3.org" <public-prov-wg@w3.org>
Message-ID: <CAJCyKRpayfaXJGfYjtKrnoKGt36zW8AfFNcXpgnhP2DCVhsEwg@mail.gmail.com>
Jun,

My idea was that the provenance recorded at in the bundle for that
entity would be for a specific period of time For example, I might use
a series of bundles to capture progressively larger chucks as for
example my program executed. So I could find provenance for the entity
from time t1 -> t2 in bundle1 and from t2 -> t3 in bundle2

It's not a completely fleshed out idea but I can imagine wanting to
qualify locatedin for a number of reasons.

cheers
Paul

On Sun, Jun 3, 2012 at 12:36 AM, Jun Zhao <jun.zhao@zoo.ox.ac.uk> wrote:
> Paul,
>
> At first sight, I loved your proposal. But after reading into it, I got less sure.
>
> This property is to allow locating the bundle in which the provenance of an entity is described. To qualify this, would it mean that, e.g, there is a time period during which you can find provenance of that entity in the bundle and after that you can't?
>
> Although the pattern you propose makes sense, I can't see when people need to qualify this relation. If you have a more concrete example in mind, I am ready to be convinced!
>
> Cheers,
>
> Jun
>
> Sent from my iPad (sorry for the brevity)
>
> On 1 Jun 2012, at 17:03, Paul Groth <p.t.groth@vu.nl> wrote:
>
>> Hi All,
>>
>> It seems that a one approach would be to define an extensible version
>> of hasProvenanceIn and leave it at that.
>>
>> hasProvenanceIn(id, entity, bundle, attrs).
>>
>> Like all our extensible relations, we would also have the straight
>> binary version
>>
>> hasProvenanceIn(entity,bundle)
>>
>> This would allow for the extensibility to cater for Luc's use case but
>> also for other use cases where extension is nice. For example, I can
>> imagine a system wanting to put a time constraint on the applicability
>> of provenance in a bundle to an entity.
>>
>> This would leave it up to people to define specialization, alternate
>> and derivation relations between entities as they want.
>>
>> Would this be acceptable to the group?
>>
>> Thanks
>> Paul
>>
>>
>>
>> On Fri, Jun 1, 2012 at 5:33 PM, Luc Moreau <L.Moreau@ecs.soton.ac.uk> wrote:
>>> Hi Simon,
>>>
>>> Thanks for your message. I feel you don't directly respond to the points
>>> that I raised,
>>> and therefore all my comments stand.
>>>
>>> I respond to your points below.
>>>
>>> On 06/01/2012 03:39 PM, Miles, Simon wrote:
>>>> Hi Luc,
>>>>
>>>> I will try to articulate the points which I think back up the binary relations proposal.
>>>>
>>>> 1. As I understood, there is currently no semantics to a bundle. A querier can choose to consider the descriptions in the bundle or not (based on the bundle's provenance), but whether there are one or many bundles, the querier just has a set of PROV descriptions. The bundles need to be found and known to be relevant, which is why hasProvenanceIn (or isTopicOf) is needed. After that, which bundle a description is in is irrelevant and the bundling can be ignored. A specific extension of PROV may change this by adding semantics to bundles, but this is not in the current specification.
>>>>
>>>>
>>>
>>> A close notion to bundle in prior provenance art is opm:Account, and
>>> there is plenty of evidence
>>> that merging accounts may lead to contradictions.  PROV, rightly so,
>>> does not define a union operator
>>> over bundles, and is silent about merging or not bundles.
>>>
>>> Therefore,  there is nothing in PROV that backs this statement "which
>>> bundle a description is in is
>>> irrelevant and the bundling can be ignored".
>>>
>>> You are suggesting that an extension of PROV may add semantics to
>>> bundles: that's exactly what you
>>> have done, by implying they are mergeable.
>>>
>>>> Taking the statements from the three bundles below, a querier would end up with:
>>>>
>>>>  activity(ex:a1, 2011-11-16T16:00:00,2011-11-16T17:0:00)
>>>>  wasAssociatedWith(ex:a1,ex:Bob,[prov:role="controller"])
>>>>  activity(ex:a2, 2011-11-17T10:00:00,2011-11-17T17:0:00)
>>>>  wasAssociatedWith(ex:a2,ex:Bob,[prov:role="controller"])
>>>>  agent(tool:Bob1, [perf:rating="good"])
>>>>  agent(tool:Bob2, [perf:rating="bad"])
>>>>
>>>> I can see nothing in the current specification to suggest this means anything different to when these descriptions are separated into multiple bundles. Do you agree?
>>>>
>>>>
>>>
>>> PROV does not specify whether they mean something different or not.
>>>
>>>> 2. If there are two entity identifiers relating to the same thing/entity, we need to say how they are connected: either alternateOf, specializationOf, or possibly some external relation such as owl:sameAs. While the example below happens to imply a specialisation relation between tool:Bob1 and ex:Bob, there is no reason to believe this is true in all cases: alternateOf is just as possible. So, hasProvenanceIn cannot imply or be a sub-type of either specializationOf or alternateOf, the appropriate one must be asserted separately.
>>>>
>>>
>>> I agree that being able to assert subtypes for hasProvenanceIn is
>>> important: that why I am
>>> in favour of having hasProvenanceIn a n-ary relation that includes
>>> attributes so that prov:type can be
>>> used for what you suggest.
>>>> 3. The same thing described from different perspectives has multiple identifiers regardless of bundles, i.e. at least one for each entity. When a bundle is newly read by a querier interested in the provenance of entity E, they should consider every entity E is a specialisation of, and look for those identifiers as well. If they don't, they will miss information about the provenance of E described at a coarser granularity.
>>>>
>>>> For example, ex:Bob may be a specialisation of ex:GeneralBob, and bundle ex:run1 might describe something about ex:GeneralBob's provenance. This makes "hasProvenanceIn(tool:Bob1, ex:run1, ex:Bob)" strange, because it is not only ex:Bob that is relevant to look for in ex:run1.
>>>>
>>>> Separating concerns, I'd argue it is preferable to say:
>>>>  hasProvenanceIn(tool:Bob1, ex:run1)
>>>>  specializationOf(tool:Bob1, ex:Bob)
>>>>  specializationOf(tool:Bob, ex:GeneralBob)
>>>>
>>> But this latter statement would belong to the ex:run1 bundle I assume.
>>> It is not going to be known to be relevant to me until I have correctly
>>> been able to link tool:Bob1 to ex:Bob in run1.
>>>
>>>
>>>> and let the que



-- 
--
Dr. Paul Groth (p.t.groth@vu.nl)
http://www.few.vu.nl/~pgroth/
Assistant Professor
Knowledge Representation & Reasoning Group
Artificial Intelligence Section
Department of Computer Science
VU University Amsterdam
Received on Sunday, 3 June 2012 16:44:16 UTC