W3C home > Mailing lists > Public > public-prov-wg@w3.org > June 2012

Re: ISSUE-385: hasProvenanceIn: finding a solution

From: Luc Moreau <L.Moreau@ecs.soton.ac.uk>
Date: Fri, 01 Jun 2012 16:33:41 +0100
Message-ID: <EMEW3|2dc1c4b6b185b296cfc666a7fb4a0a29o50GXi08L.Moreau|ecs.soton.ac.uk|4FC8E0D5.7050908@ecs.soton.ac.uk>
To: public-prov-wg@w3.org
Hi Simon,

Thanks for your message. I feel you don't directly respond to the points 
that I raised,
and therefore all my comments stand.

I respond to your points below.

On 06/01/2012 03:39 PM, Miles, Simon wrote:
> Hi Luc,
> I will try to articulate the points which I think back up the binary relations proposal.
> 1. As I understood, there is currently no semantics to a bundle. A querier can choose to consider the descriptions in the bundle or not (based on the bundle's provenance), but whether there are one or many bundles, the querier just has a set of PROV descriptions. The bundles need to be found and known to be relevant, which is why hasProvenanceIn (or isTopicOf) is needed. After that, which bundle a description is in is irrelevant and the bundling can be ignored. A specific extension of PROV may change this by adding semantics to bundles, but this is not in the current specification.

A close notion to bundle in prior provenance art is opm:Account, and 
there is plenty of evidence
that merging accounts may lead to contradictions.  PROV, rightly so, 
does not define a union operator
over bundles, and is silent about merging or not bundles.

Therefore,  there is nothing in PROV that backs this statement "which 
bundle a description is in is
irrelevant and the bundling can be ignored".

You are suggesting that an extension of PROV may add semantics to 
bundles: that's exactly what you
have done, by implying they are mergeable.

> Taking the statements from the three bundles below, a querier would end up with:
>    activity(ex:a1, 2011-11-16T16:00:00,2011-11-16T17:0:00)
>    wasAssociatedWith(ex:a1,ex:Bob,[prov:role="controller"])
>    activity(ex:a2, 2011-11-17T10:00:00,2011-11-17T17:0:00)
>    wasAssociatedWith(ex:a2,ex:Bob,[prov:role="controller"])
>    agent(tool:Bob1, [perf:rating="good"])
>    agent(tool:Bob2, [perf:rating="bad"])
> I can see nothing in the current specification to suggest this means anything different to when these descriptions are separated into multiple bundles. Do you agree?

PROV does not specify whether they mean something different or not.

> 2. If there are two entity identifiers relating to the same thing/entity, we need to say how they are connected: either alternateOf, specializationOf, or possibly some external relation such as owl:sameAs. While the example below happens to imply a specialisation relation between tool:Bob1 and ex:Bob, there is no reason to believe this is true in all cases: alternateOf is just as possible. So, hasProvenanceIn cannot imply or be a sub-type of either specializationOf or alternateOf, the appropriate one must be asserted separately.

I agree that being able to assert subtypes for hasProvenanceIn is 
important: that why I am
in favour of having hasProvenanceIn a n-ary relation that includes 
attributes so that prov:type can be
used for what you suggest.
> 3. The same thing described from different perspectives has multiple identifiers regardless of bundles, i.e. at least one for each entity. When a bundle is newly read by a querier interested in the provenance of entity E, they should consider every entity E is a specialisation of, and look for those identifiers as well. If they don't, they will miss information about the provenance of E described at a coarser granularity.
> For example, ex:Bob may be a specialisation of ex:GeneralBob, and bundle ex:run1 might describe something about ex:GeneralBob's provenance. This makes "hasProvenanceIn(tool:Bob1, ex:run1, ex:Bob)" strange, because it is not only ex:Bob that is relevant to look for in ex:run1.
> Separating concerns, I'd argue it is preferable to say:
>    hasProvenanceIn(tool:Bob1, ex:run1)
>    specializationOf(tool:Bob1, ex:Bob)
>    specializationOf(tool:Bob, ex:GeneralBob)
But this latter statement would belong to the ex:run1 bundle I assume.
It is not going to be known to be relevant to me until I have correctly 
been able to link tool:Bob1 to ex:Bob in run1.

> and let the querier search ex:run1 for all identifiers relevant to the entity. It seems irrelevant that the identifier tool:Bob1 is itself absent from bundle ex:run1, as it is only one of many identifiers for the entity/thing anyway.
> Paraphrasing Paul from the telecon, hasProvenanceIn(tool:Bob1, ex:run1) can just mean "look in ex:run1 for more stuff relevant to tool:Bob1". If you know that tool:Bob1 is a specialisation of ex:Bob, then you should also look for ex:Bob.

I prefer Tim's interpretation tool:Bob1 is a topic in ex:run1, but I am 
saying that it is not a topic in ex:run1, ex:Bob is.
There is an aliasing issue happening here.

1. If when generating ex:run1 and ex:run2, I had known about the 
profiling tool, I could have generated instance of ex:bob1 and ex:bob2,
     so that they can be individually assessed. But that's not the way 
things work. We reuse identifiers.

2. I had  assessed only one instance of ex:Bob in my tool bundle, then
I could have reused the same identifier ex:Bob and 
hasProvenanceIn(ex:Bob, ex:run1)
would have been sufficient.

It is only because I want to talk about two different specializations of 
ex:Bob in the tool bundle
that I am forced to change the identifiers. It is an aliasing issue.

My objection for a binary hasProvenanceIn(subject,bundle) is that it is 
not extensible in PROV.
I cannot subtype it, and I cannot have (a standardized or not) way of 
handling the aliasing.

> Thanks,
> Simon
> Dr Simon Miles
> Senior Lecturer, Department of Informatics
> Kings College London, WC2R 2LS, UK
> +44 (0)20 7848 1166
> accounting for the reasons behind contractual violations:
> http://eprints.dcs.kcl.ac.uk/1283/
> ________________________________________
> From: Luc Moreau [L.Moreau@ecs.soton.ac.uk]
> Sent: 31 May 2012 22:54
> To: Provenance Working Group WG
> Subject: ISSUE-385: hasProvenanceIn: finding a solution
> All,
> To try and converge towards a solution, I am
> circulating an example using a ternary hasProvenanceIn.
> I would like to understand if and how we can make it work with
> a simpler relation.
> Two bundles ex:run1 and ex:run2 describe bob's role as a controller
> of two activities.  Same bob, two different bundles.
>       bundle ex:run1
>        activity(ex:a1, 2011-11-16T16:00:00,2011-11-16T17:0:00)
> //duration: 1hour
>        wasAssociatedWith(ex:a1,ex:Bob,[prov:role="controller"])
>       endBundle
>       bundle ex:run2
>        activity(ex:a2, 2011-11-17T10:00:00,2011-11-17T17:0:00)
> //duration: 7hours
>        wasAssociatedWith(ex:a2,ex:Bob,[prov:role="controller"])
>       endBundle
> A performance analysis tool rates the performance of agents (this could
> be used
> to dispatch further work to performant agents, or congratulate them, etc).
>       bundle tool:analysis01
>         agent(tool:Bob1, [perf:rating="good"])
>         hasProvenanceIn(tool:Bob1, ex:run1, ex:Bob)  // Bob performance
> in ex:run1 is good
>         agent(tool:Bob2, [perf:rating="bad"])
>         hasProvenanceIn(tool:Bob2, ex:run2, ex:Bob)  // Bob performance
> in ex:run2 is bad
>       endBundle
> The performance analysis tool has to rate two involvements of ex:Bob in
> two separate activities.
> Two specialized version of ex:Bob are defined: tool:bob1 and tool:bob2,
> with rating good and
> bad respectively.
> tool:Bob1 is linked to ex:Bob in run1, and tool:Bob2 is linked to ex:Bob
> in run2, with the following
>         hasProvenanceIn(tool:Bob1, ex:run1, ex:Bob)
>         hasProvenanceIn(tool:Bob2, ex:run2, ex:Bob)
> Nothing is expressed about ex:Bob in bundle tool:analysis01 (except that
> this is an alias
> for tool:Bob1 and tool:Bob2).
> It is suggested that the ternary relation could be replaced by
> isTopicIn(tool:Bob1, ex:run1)
> and
> specialization(tool:Bob1, ex:Bob).
> I don't understand the point of
>     isTopicIn(tool:Bob1, ex:run1)
> since tool:Bob1 is not a topic in ex:run1.
> Also, we now seem to have made ex:Bob a topic of tool:analysis01, because
> the following expression.
> specialization(tool:Bob1, ex:Bob).
>   From tool:analysis01, where do I find provenance about ex:Bob?
> It look like this has become a dead end in this graph.
> Do I need to introduce:
>     isTopicIn(ex:Bob, ex:run1)
>     isTopicIn(ex:Bob, ex:run2)?
> So now we would  have:
> isTopicIn(tool:Bob1, ex:run1)
> specialization(tool:Bob1, ex:Bob)
> isTopicIn(tool:Bob2, ex:run2)
> specialization(tool:Bob2, ex:Bob)
> isTopicIn(ex:Bob, ex:run1)
> isTopicIn(ex:Bob, ex:run2)
> Which means that:
> specialization(tool:Bob1, ex:Bob)
> isTopicIn(ex:Bob, ex:run2)
> ... would lead us to believe that good rating is due to slow performance.
> Can the proposer of the separate binary relations explain how this
> example can work?
> Thanks,
> Luc

Professor Luc Moreau
Electronics and Computer Science   tel:   +44 23 8059 4487
University of Southampton          fax:   +44 23 8059 2865
Southampton SO17 1BJ               email: l.moreau@ecs.soton.ac.uk
United Kingdom                     http://www.ecs.soton.ac.uk/~lavm
Received on Friday, 1 June 2012 15:34:20 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 21:58:15 UTC