Re: ISSUE-385: hasProvenanceIn: finding a solution from Luc Moreau on 2012-06-01 (public-prov-wg@w3.org from June 2012)

From: Luc Moreau <L.Moreau@ecs.soton.ac.uk>
Date: Fri, 01 Jun 2012 16:33:41 +0100
To: public-prov-wg@w3.org
Message-ID: <EMEW3|2dc1c4b6b185b296cfc666a7fb4a0a29o50GXi08L.Moreau|ecs.soton.ac.uk|4FC8E0D5>
Hi Simon,

Thanks for your message. I feel you don't directly respond to the points 
that I raised,
and therefore all my comments stand.

I respond to your points below.

On 06/01/2012 03:39 PM, Miles, Simon wrote:
> Hi Luc,
>
> I will try to articulate the points which I think back up the binary relations proposal.
>
> 1. As I understood, there is currently no semantics to a bundle. A querier can choose to consider the descriptions in the bundle or not (based on the bundle's provenance), but whether there are one or many bundles, the querier just has a set of PROV descriptions. The bundles need to be found and known to be relevant, which is why hasProvenanceIn (or isTopicOf) is needed. After that, which bundle a description is in is irrelevant and the bundling can be ignored. A specific extension of PROV may change this by adding semantics to bundles, but this is not in the current specification.
>
>    

A close notion to bundle in prior provenance art is opm:Account, and 
there is plenty of evidence
that merging accounts may lead to contradictions.  PROV, rightly so, 
does not define a union operator
over bundles, and is silent about merging or not bundles.

Therefore,  there is nothing in PROV that backs this statement "which 
bundle a description is in is
irrelevant and the bundling can be ignored".

You are suggesting that an extension of PROV may add semantics to 
bundles: that's exactly what you
have done, by implying they are mergeable.

> Taking the statements from the three bundles below, a querier would end up with:
>
>    activity(ex:a1, 2011-11-16T16:00:00,2011-11-16T17:0:00)
>    wasAssociatedWith(ex:a1,ex:Bob,[prov:role="controller"])
>    activity(ex:a2, 2011-11-17T10:00:00,2011-11-17T17:0:00)
>    wasAssociatedWith(ex:a2,ex:Bob,[prov:role="controller"])
>    agent(tool:Bob1, [perf:rating="good"])
>    agent(tool:Bob2, [perf:rating="bad"])
>
> I can see nothing in the current specification to suggest this means anything different to when these descriptions are separated into multiple bundles. Do you agree?
>
>    

PROV does not specify whether they mean something different or not.

> 2. If there are two entity identifiers relating to the same thing/entity, we need to say how they are connected: either alternateOf, specializationOf, or possibly some external relation such as owl:sameAs. While the example below happens to imply a specialisation relation between tool:Bob1 and ex:Bob, there is no reason to believe this is true in all cases: alternateOf is just as possible. So, hasProvenanceIn cannot imply or be a sub-type of either specializationOf or alternateOf, the appropriate one must be asserted separately.
>    

I agree that being able to assert subtypes for hasProvenanceIn is 
important: that why I am
in favour of having hasProvenanceIn a n-ary relation that includes 
attributes so that prov:type can be
used for what you suggest.
> 3. The same thing described from different perspectives has multiple identifiers regardless of bundles, i.e. at least one for each entity. When a bundle is newly read by a querier interested in the provenance of entity E, they should consider every entity E is a specialisation of, and look for those identifiers as well. If they don't, they will miss information about the provenance of E described at a coarser granularity.
>
> For example, ex:Bob may be a specialisation of ex:GeneralBob, and bundle ex:run1 might describe something about ex:GeneralBob's provenance. This makes "hasProvenanceIn(tool:Bob1, ex:run1, ex:Bob)" strange, because it is not only ex:Bob that is relevant to look for in ex:run1.
>
> Separating concerns, I'd argue it is preferable to say:
>    hasProvenanceIn(tool:Bob1, ex:run1)
>    specializationOf(tool:Bob1, ex:Bob)
>    specializationOf(tool:Bob, ex:GeneralBob)
>    
But this latter statement would belong to the ex:run1 bundle I assume.
It is not going to be known to be relevant to me until I have correctly 
been able to link tool:Bob1 to ex:Bob in run1.


> and let the querier search ex:run1 for all identifiers relevant to the entity. It seems irrelevant that the identifier tool:Bob1 is itself absent from bundle ex:run1, as it is only one of many identifiers for the entity/thing anyway.
>
> Paraphrasing Paul from the telecon, hasProvenanceIn(tool:Bob1, ex:run1) can just mean "look in ex:run1 for more stuff relevant to tool:Bob1". If you know that tool:Bob1 is a specialisation of ex:Bob, then you should also look for ex:Bob.
>    

I prefer Tim's interpretation tool:Bob1 is a topic in ex:run1, but I am 
saying that it is not a topic in ex:run1, ex:Bob is.
There is an aliasing issue happening here.

1. If when generating ex:run1 and ex:run2, I had known about the 
profiling tool, I could have generated instance of ex:bob1 and ex:bob2,
     so that they can be individually assessed. But that's not the way 
things work. We reuse identifiers.

2. I had  assessed only one instance of ex:Bob in my tool bundle, then
I could have reused the same identifier ex:Bob and 
hasProvenanceIn(ex:Bob, ex:run1)
would have been sufficient.

It is only because I want to talk about two different specializations of 
ex:Bob in the tool bundle
that I am forced to change the identifiers. It is an aliasing issue.

My objection for a binary hasProvenanceIn(subject,bundle) is that it is 
not extensible in PROV.
I cannot subtype it, and I cannot have (a standardized or not) way of 
handling the aliasing.




Luc
> Thanks,
> Simon
>
> Dr Simon Miles
> Senior Lecturer, Department of Informatics
> Kings College London, WC2R 2LS, UK
> +44 (0)20 7848 1166
>
> accounting for the reasons behind contractual violations:
> http://eprints.dcs.kcl.ac.uk/1283/
> ________________________________________
> From: Luc Moreau [L.Moreau@ecs.soton.ac.uk]
> Sent: 31 May 2012 22:54
> To: Provenance Working Group WG
> Subject: ISSUE-385: hasProvenanceIn: finding a solution
>
> All,
>
> To try and converge towards a solution, I am
> circulating an example using a ternary hasProvenanceIn.
> I would like to understand if and how we can make it work with
> a simpler relation.
>
>
> Two bundles ex:run1 and ex:run2 describe bob's role as a controller
> of two activities.  Same bob, two different bundles.
>
>       bundle ex:run1
>        activity(ex:a1, 2011-11-16T16:00:00,2011-11-16T17:0:00)
> //duration: 1hour
>        wasAssociatedWith(ex:a1,ex:Bob,[prov:role="controller"])
>       endBundle
>
>       bundle ex:run2
>        activity(ex:a2, 2011-11-17T10:00:00,2011-11-17T17:0:00)
> //duration: 7hours
>        wasAssociatedWith(ex:a2,ex:Bob,[prov:role="controller"])
>       endBundle
>
>
> A performance analysis tool rates the performance of agents (this could
> be used
> to dispatch further work to performant agents, or congratulate them, etc).
>
>
>       bundle tool:analysis01
>
>         agent(tool:Bob1, [perf:rating="good"])
>         hasProvenanceIn(tool:Bob1, ex:run1, ex:Bob)  // Bob performance
> in ex:run1 is good
>
>         agent(tool:Bob2, [perf:rating="bad"])
>         hasProvenanceIn(tool:Bob2, ex:run2, ex:Bob)  // Bob performance
> in ex:run2 is bad
>
>       endBundle
>
> The performance analysis tool has to rate two involvements of ex:Bob in
> two separate activities.
> Two specialized version of ex:Bob are defined: tool:bob1 and tool:bob2,
> with rating good and
> bad respectively.
>
> tool:Bob1 is linked to ex:Bob in run1, and tool:Bob2 is linked to ex:Bob
> in run2, with the following
>
>         hasProvenanceIn(tool:Bob1, ex:run1, ex:Bob)
>         hasProvenanceIn(tool:Bob2, ex:run2, ex:Bob)
>
> Nothing is expressed about ex:Bob in bundle tool:analysis01 (except that
> this is an alias
> for tool:Bob1 and tool:Bob2).
>
> It is suggested that the ternary relation could be replaced by
> isTopicIn(tool:Bob1, ex:run1)
> and
> specialization(tool:Bob1, ex:Bob).
>
> I don't understand the point of
>     isTopicIn(tool:Bob1, ex:run1)
> since tool:Bob1 is not a topic in ex:run1.
>
> Also, we now seem to have made ex:Bob a topic of tool:analysis01, because
> the following expression.
> specialization(tool:Bob1, ex:Bob).
>
>   From tool:analysis01, where do I find provenance about ex:Bob?
> It look like this has become a dead end in this graph.
>
> Do I need to introduce:
>     isTopicIn(ex:Bob, ex:run1)
>     isTopicIn(ex:Bob, ex:run2)?
>
>
> So now we would  have:
> isTopicIn(tool:Bob1, ex:run1)
> specialization(tool:Bob1, ex:Bob)
> isTopicIn(tool:Bob2, ex:run2)
> specialization(tool:Bob2, ex:Bob)
> isTopicIn(ex:Bob, ex:run1)
> isTopicIn(ex:Bob, ex:run2)
>
> Which means that:
>
> specialization(tool:Bob1, ex:Bob)
> isTopicIn(ex:Bob, ex:run2)
>
> ... would lead us to believe that good rating is due to slow performance.
>
> Can the proposer of the separate binary relations explain how this
> example can work?
>
> Thanks,
> Luc
>    

-- 
Professor Luc Moreau
Electronics and Computer Science   tel:   +44 23 8059 4487
University of Southampton          fax:   +44 23 8059 2865
Southampton SO17 1BJ               email: l.moreau@ecs.soton.ac.uk
United Kingdom                     http://www.ecs.soton.ac.uk/~lavm
Received on Friday, 1 June 2012 15:34:20 UTC