Re: prov-dm derivation: three proposals to vote on (deadline Wednesday midnight GMT) from Stian Soiland-Reyes on 2011-11-13 (public-prov-wg@w3.org from November 2011)

From: Stian Soiland-Reyes <soiland-reyes@cs.manchester.ac.uk>
Date: Sun, 13 Nov 2011 13:29:01 +0000
To: Simon Miles <simon.miles@kcl.ac.uk>
Cc: Provenance Working Group WG <public-prov-wg@w3.org>
Message-ID: <CAPRnXtmgJRkErFRV0F4gPe0fkvfTAp0ZsqZBB-rRFvWkpQdphA@mail.gmail.com>
I'm with Simon on this. We need:

1) A way to say that A was derived from B, in the sense that B
influence A somehow
1a) That this occurred by a single process which used B and later generated A
1b) That B was used arbitrary PE steps before A was generated (higher
granularity does not break the derivation)
2) A way to say that A is part of B's lineage (arbitrary steps before)
but not necessarily influenced it

I don't think knowledge of which process execution or not it is
matters. We know that there must have been a process execution to
generate A. If A is derived (#1) or based on (#2) B - then there must
exist a use/control-process-execution-generation path between A and B.
The asserter might not know all these details.

I suggest:

wasBasedOn(A,B) (#2)
wasDerivedFrom(A,B) (#1b)

and that's it.

wasBasedOn(A,B) states that there exists a PE which generated A, and a
PE which B participated in (ie. was used by or controllled) - and
these PEs are either the same, or it is possible to establish a chain
of participation and generation of arbitrary entities and arbitrary
PEs.

wasBasedOn(A,B) is transitive and can be inferred iff:
wasGeneratedBy(A, pe0)
dependedOn(pe0, B)
-or-
wasGeneratedBy(A, pe0)
dependedOn(pe0, x)
wasBasedOn(x, B)


There is no requirement that it is a single PE that used B and
generated A - it might be a series of such inter-connected process
executions.

Unlike wasBasedOn() - wasDerivedFrom(A,B) is a stronger, explicit
statement that adds information - it says that B influenced A. Without
asserting wasDerivedFrom, a PE might have used B and generated A
without such influence.

As such it is not transitive, if wasDerivedFrom(A,x) and
wasDerivedFrom(x,B) it does not follow that wasDerivedFrom(A,B).
However wasDerivedFrom(A,x) always implies wasBasedOn(A,x) which *is*
transitive, so you can at least conclude wasBasedOn(A,B).

If the asserter wanted to include details that A was influenced by
both B and x, then two wasDerivedFrom() assertions can be made.

I think by keeping wasDerivedFrom() as such a subproperty of
wasBasedOn() you allow the PE chain in both cases - and don't have to
force a certain granularity of process executions.


I can see the reason for qualifying which PE was involved in
wasDerivedFrom(), because there could have been multiple PEs which
used B - but not all of them contributed to the derivation that
generated A. However I think it is too restrictive to lock this into a
single PE.


On Thu, Nov 10, 2011 at 14:58, Simon Miles <simon.miles@kcl.ac.uk> wrote:
> Hi Luc,
>
> Yes, I think the analogy is correct. I claim that C2 is not
> necessarily derived from e, because of the nature of the collection. I
> cannot claim that it is never derived from e.
>
> Being specific about the collection structure, maybe C0 is a tree, a
> subtree e is added to make tree C1, then removed to make tree C2 (and
> other changes might take place also). I claim that C2 was eventually
> derived from C0 and from C1 but not e, as it wouldn't have made a
> difference to C2 what subtree e contained or whether it had existed at
> all. However, some might say it is odd to exclude e from C2's history
> entirely as we don't know what would have occurred if e had not
> existed, so we allow that C2 'depended on' e.
>
> I'm sure there are structures where adding then removing an element
> has an effect on the eventual collection (the eventual collection
> would not be as it is had it not been for the element). If so, then
> the eventual collection would be derived from the element.
>
> Thanks,
> Simon
>
> On 10 November 2011 13:50, Luc Moreau <l.moreau@ecs.soton.ac.uk> wrote:
>> Hi Simon,
>>
>> Still trying to understand what you wrote.
>> Paraphrasing your example,
>>
>>  Someone asserts that a collection C2 is derived from collection C1 by
>> removing e from C1
>>  You claim that C2 is not necessarily derived from e,
>>   or
>>  do you claim that C2 is never derived from e,
>>
>> Is it a correct analogy?  which claim are you making?
>>
>> Thanks,
>> Luc
>>
>> On 10/11/2011 10:46, Simon Miles wrote:
>>> In my example, the designer may assert that the first draft page was
>>> derived from the banner image ("DRAFT") that it contains, while the
>>> publisher may assert that the published page (excluding the banner)
>>> was derived from the first draft. But the published page is not
>>> derived from the banner image, because it would not make any
>>> difference should the banner have been different, or even not been
>>> present at all, e.g. the first draft could still have existed even if
>>> the banner had been deleted earlier. To allow a transitive
>>> derivation-like relation to exist, it must have semantics so weak as
>>> to allow the published page to be linked to the banner. I understood
>>> this weakened relation to be dependedOn. This relation does not remove
>>> the need for an actual derivation relation to be expressed. I don't
>>> have a strong opinion on whether a transitive relation needs to exist.
>>>
>>
>>
>
>
>
> --
> Dr Simon Miles
> Lecturer, Department of Informatics
> Kings College London, WC2R 2LS, UK
> +44 (0)20 7848 1166
>
>



-- 
Stian Soiland-Reyes, myGrid team
School of Computer Science
The University of Manchester
Received on Sunday, 13 November 2011 13:30:01 UTC