Fwd: ISSUE-385: hasProvenanceIn: finding a solution

Professor Luc Moreau
Electronics and Computer Science
University of Southampton
Southampton SO17 1BJ
United Kingdom

Begin forwarded message:

From: Graham Klyne <gklyne@googlemail.com<mailto:gklyne@googlemail.com>>
Date: 1 June 2012 21:11:32 GMT+01:00
To: Paul Groth <p.t.groth@vu.nl<mailto:p.t.groth@vu.nl>>, Luc Moreau <L.Moreau@ecs.soton.ac.uk<mailto:L.Moreau@ecs.soton.ac.uk>>
Cc: "public-prov-wg@w3.org<mailto:public-prov-wg@w3.org>" <public-prov-wg@w3.org<mailto:public-prov-wg@w3.org>>
Subject: Re: ISSUE-385: hasProvenanceIn: finding a solution

Works for me, I think.

#g.

--
Sent from my Android phone with K-9 Mail. Please excuse my brevity.

Paul Groth <p.t.groth@vu.nl<mailto:p.t.groth@vu.nl>> wrote:

Hi All,

It seems that a one approach would be to define an extensible version
of hasProvenanceIn and leave it at that.

hasProvenanceIn(id, entity, bundle, attrs).

Like all our extensible relations, we would also have the straight
binary version

hasProvenanceIn(entity,bundle)

This would allow for the extensibility to cater for Luc's use case but
also for other use cases where extension is nice. For example, I can
imagine a system wanting to put a time constraint on the applicability
of provenance in a bundle to an entity.

This would leave it up to people to define specialization, alternate
and derivation relations between entities as they want.

Would this be acceptable to the group?

Thanks
Paul



On Fri, Jun 1, 2012 at 5:33 PM, Luc Moreau
<L.Moreau@ecs.soton.ac.uk<mailto:L.Moreau@ecs.soton.ac.uk>> wrote:
> Hi Simon,
>
> Thanks for your message. I feel you don't directly respond to the points
> that I raised,
> and therefore all my comments stand.
>
> I respond to your points below.
>
> On 06/01/2012 03:39 PM, Miles, Simon wrote:
>> Hi Luc,
>>
>> I will try to articulate the points which I think back up the binary relations proposal.
>>
>> 1. As I understood, there is currently no semantics to a bundle. A querier can choose to consider the descriptions in the bundle or not (based on the bundle's provenance), but whether there are one or many bundles, the querier just has a set of PROV descriptions. The bundles need to be found and known to be relevant, which is why hasProvenanceIn (or isTopicOf) is needed. After that, which bundle a description is in is irrelevant and the bundling can be ignored. A specific extension of
PROV may change this by adding semantics to bundles, but this is not in the current specification.
>>
>>
>
> A close notion to bundle in prior provenance art is opm:Account, and
> there is plenty of evidence
> that merging accounts may lead to contradictions.  PROV, rightly so,
> does not define a union operator
> over bundles, and is silent about merging or not bundles.
>
> Therefore,  there is nothing in PROV that backs this statement "which
> bundle a description is in is
> irrelevant and the bundling can be ignored".
>
> You are suggesting that an extension of PROV may add semantics to
> bundles: that's exactly what you
> have done, by implying they are mergeable.
>
>> Taking the statements from the three bundles below, a querier would end up with:
>>
>>    activity(ex:a1,
2011-11-16T16:00:00,2011-11-16T17:0:00)
>>    wasAssociatedWith(ex:a1,ex:Bob,[prov:role="controller"])
>>    activity(ex:a2, 2011-11-17T10:00:00,2011-11-17T17:0:00)
>>    wasAssociatedWith(ex:a2,ex:Bob,[prov:role="controller"])
>>    agent(tool:Bob1, [perf:rating="good"])
>>    agent(tool:Bob2, [perf:rating="bad"])
>>
>> I can see nothing in the current specification to suggest this means anything different to when these descriptions are separated into multiple bundles. Do you agree?
>>
>>
>
> PROV does not specify whether they mean something different or not.
>
>> 2. If there are two entity identifiers relating to the same thing/entity, we need to say how they are connected: either alternateOf, specializationOf, or possibly some external relation such as owl:sameAs. While the example below happens to imply a specialisation relation between
tool:Bob1 and ex:Bob, there is no reason to believe this is true in all cases: alternateOf is just as possible. So, hasProvenanceIn cannot imply or be a sub-type of either specializationOf or alternateOf, the appropriate one must be asserted separately.
>>
>
> I agree that being able to assert subtypes for hasProvenanceIn is
> important: that why I am
> in favour of having hasProvenanceIn a n-ary relation that includes
> attributes so that prov:type can be
> used for what you suggest.
>> 3. The same thing described from different perspectives has multiple identifiers regardless of bundles, i.e. at least one for each entity. When a bundle is newly read by a querier interested in the provenance of entity E, they should consider every entity E is a specialisation of, and look for those identifiers as well. If they don't, they will miss information about the provenance of E described at a coarser granularity.
>>
>> For example, ex:Bob may be a specialisation of ex:GeneralBob, and bundle ex:run1 might describe something about ex:GeneralBob's provenance. This makes "hasProvenanceIn(tool:Bob1, ex:run1, ex:Bob)" strange, because it is not only ex:Bob that is relevant to look for in ex:run1.
>>
>> Separating concerns, I'd argue it is preferable to say:
>>    hasProvenanceIn(tool:Bob1, ex:run1)
>>    specializationOf(tool:Bob1, ex:Bob)
>>    specializationOf(tool:Bob, ex:GeneralBob)
>>
> But this latter statement would belong to the ex:run1 bundle I assume.
> It is not going to be known to be relevant to me until I have correctly
> been able to link tool:Bob1 to ex:Bob in run1.
>
>
>> and let the querier search ex:run1 for all identifiers relevant to the entity. It seems irrelevant that the identifier tool:Bob1 is itself absent from bundle ex:run1, as it
is only one of many identifiers for the entity/thing anyway.
>>
>> Paraphrasing Paul from the telecon, hasProvenanceIn(tool:Bob1, ex:run1) can just mean "look in ex:run1 for more stuff relevant to tool:Bob1". If you know that tool:Bob1 is a specialisation of ex:Bob, then you should also look for ex:Bob.
>>
>
> I prefer Tim's interpretation tool:Bob1 is a topic in ex:run1, but I am
> saying that it is not a topic in ex:run1, ex:Bob is.
> There is an aliasing issue happening here.
>
> 1. If when generating ex:run1 and ex:run2, I had known about the
> profiling tool, I could have generated instance of ex:bob1 and ex:bob2,
>     so that they can be individually assessed. But that's not the way
> things work. We reuse identifiers.
>
> 2. I had  assessed only one instance of ex:Bob in my tool bundle, then
> I could have reused the same identifier ex:Bob
and
> hasProvenanceIn(ex:Bob, ex:run1)
> would have been sufficient.
>
> It is only because I want to talk about two different specializations of
> ex:Bob in the tool bundle
> that I am forced to change the identifiers. It is an aliasing issue.
>
> My objection for a binary hasProvenanceIn(subject,bundle) is that it is
> not extensible in PROV.
> I cannot subtype it, and I cannot have (a standardized or not) way of
> handling the aliasing.
>
>
>
>
> Luc
>> Thanks,
>> Simon
>>
>> Dr Simon Miles
>> Senior Lecturer, Department of Informatics
>> Kings College London, WC2R 2LS, UK
>> +44 (0)20 7848 1166
>>
>> accounting for the reasons behind contractual violations:
>> http://eprints.dcs.kcl.ac.uk/1283/
>>
________________________________

>> From: Luc Moreau [L.Moreau@ecs.soton.ac.uk<mailto:L.Moreau@ecs.soton.ac.uk>]
>> Sent: 31 May 2012 22:54
>> To: Provenance Working Group WG
>> Subject: ISSUE-385: hasProvenanceIn: finding a solution
>>
>> All,
>>
>> To try and converge towards a solution, I am
>> circulating an example using a ternary hasProvenanceIn.
>> I would like to understand if and how we can make it work with
>> a simpler relation.
>>
>>
>> Two bundles ex:run1 and ex:run2 describe bob's role as a controller
>> of two activities.  Same bob, two different bundles.
>>
>>       bundle ex:run1
>>        activity(ex:a1, 2011-11-16T16:00:00,2011-11-16T17:0:00)
>> //duration: 1hour
>>        wasAssociatedWith(ex:a1,ex:Bob,[prov:role="controller"])
>>       endBundle
>>
>>       bundle ex:run2
>>        activity(ex:a2, 2011-11-17T10:00:00,2011-11-17T17:0:00)
>> //duration: 7hours
>>        wasAssociatedWith(ex:a2,ex:Bob,[prov:role="controller"])
>>       endBundle
>>
>>
>> A performance analysis tool rates the performance of agents (this could
>> be used
>> to dispatch further work to performant agents, or congratulate them, etc).
>>
>>
>>       bundle tool:analysis01
>>
>>         agent(tool:Bob1, [perf:rating="good"])
>>         hasProvenanceIn(tool:Bob1, ex:run1, ex:Bob)  // Bob performance
>> in ex:run1 is good
>>
>>         agent(tool:Bob2, [perf:rating="bad"])
>>         hasProvenanceIn(tool:Bob2, ex:run2, ex:Bob)  // Bob performance
>> in ex:run2 is bad
>>
>>
    endBundle
>>
>> The performance analysis tool has to rate two involvements of ex:Bob in
>> two separate activities.
>> Two specialized version of ex:Bob are defined: tool:bob1 and tool:bob2,
>> with rating good and
>> bad respectively.
>>
>> tool:Bob1 is linked to ex:Bob in run1, and tool:Bob2 is linked to ex:Bob
>> in run2, with the following
>>
>>         hasProvenanceIn(tool:Bob1, ex:run1, ex:Bob)
>>         hasProvenanceIn(tool:Bob2, ex:run2, ex:Bob)
>>
>> Nothing is expressed about ex:Bob in bundle tool:analysis01 (except that
>> this is an alias
>> for tool:Bob1 and tool:Bob2).
>>
>> It is suggested that the ternary relation could be replaced by
>> isTopicIn(tool:Bob1, ex:run1)
>> and
>> specialization(tool:Bob1, ex:Bob).
>>
>> I don't understand the point of
>>     isTopicIn(tool:Bob1, ex:run1)
>> since tool:Bob1 is not a topic in ex:run1.
>>
>> Also, we now seem to have made ex:Bob a topic of tool:analysis01, because
>> the following expression.
>> specialization(tool:Bob1, ex:Bob).
>>
>>   From tool:analysis01, where do I find provenance about ex:Bob?
>> It look like this has become a dead end in this graph.
>>
>> Do I need to introduce:
>>     isTopicIn(ex:Bob, ex:run1)
>>     isTopicIn(ex:Bob, ex:run2)?
>>
>>
>> So now we would  have:
>> isTopicIn(tool:Bob1, ex:run1)
>> specialization(tool:Bob1, ex:Bob)
>> isTopicIn(tool:Bob2, ex:run2)
>> specialization(tool:Bob2, ex:Bob)
>> isTopicIn(ex:Bob, ex:run1)
>> isTopicIn(ex:Bob,
ex:run2)
>>
>> Which means that:
>>
>> specialization(tool:Bob1, ex:Bob)
>> isTopicIn(ex:Bob, ex:run2)
>>
>> ... would lead us to believe that good rating is due to slow performance.
>>
>> Can the proposer of the separate binary relations explain how this
>> example can work?
>>
>> Thanks,
>> Luc
>>
>
> --
> Professor Luc Moreau
> Electronics and Computer Science   tel:   +44 23 8059 4487
> University of Southampton          fax:   +44 23 8059 2865
> Southampton SO17 1BJ               email: l.moreau@ecs.soton.ac.uk<mailto:l.moreau@ecs.soton.ac.uk>
> United Kingdom                     http://www.ecs.soton.ac.uk/~lavm
>
>



--
--
Dr. Paul Groth (p.t.groth@vu.nl<mailto:p.t.groth@vu.nl>)
http://www.few.vu.nl/~pgroth/
Assistant Professor
Knowledge Representation & Reasoning Group
Artificial Intelligence Section
Department of Computer Science
VU University Amsterdam

Received on Saturday, 2 June 2012 08:42:50 UTC