Re: PROV-ISSUE-50 (Ordering of Process): Defintion for Ordering of Process [Conceptual Model] from Luc Moreau on 2011-10-03 (public-prov-wg@w3.org from October 2011)

From: Luc Moreau <L.Moreau@ecs.soton.ac.uk>
Date: Mon, 03 Oct 2011 08:05:41 +0100
To: Satya Sahoo <satya.sahoo@case.edu>
CC: Provenance Working Group WG <public-prov-wg@w3.org>
Message-ID: <EMEW3|805f09a3d511c273ab7cb5a7cb19e4c5n9287408L.Moreau|ecs.soton.ac.uk|4E895EC5>
Hi Satya,

Responses interleaved.

On 03/10/11 01:54, Satya Sahoo wrote:
> Hi Luc,
> My comments are inline:
> >First, you will note that wasInformedBy is *not* a temporal relation 
> between process executions.
>
> The PROV-DM currently defines the following constraint for wasInformedBy:
> Given two process execution expressions denoted by pe1 and pe2, the 
> expression wasInformedBy(pe2,pe1) holds, if and only if there is an 
> entity expression denoted by e and qualifiers q1 and q2, such that 
> wasGeneratedBy(e,pe1,q1) and used(pe2,e,q2) hold.
>
> If we consider the two expressions wasGeneratedBy(e, pe1, q1) and 
> used(pe2, e, q2) - these two expressions together enforce that pe2 
> cannot have start time that is "before" start time of pe1. This is 
> temporal relation/ordering between pe1 and pe2. Hence, if both these 
> expressions have to "hold" for wasInformedBy(pe2, pe1) to "hold" I am 
> not sure how it is not a temporal ordering?

I agree that some temporal constraints have to be satisfied for 
wasInformedBy(pe2, pe1), but it's a necessary condition,
it's not a sufficient condition.  Information (represented as entity e 
above) is required to flow between process executions.

Also, it's not a temporal order, but it's a temporal relation!  It is 
not transitive!

For these reasons (information flow and non transitivity), I feel that 
wasInformedBy does not fall under
your temporal ordering classification.

>
>
> >Second, it would be nice for PROV to have a temporal ordering 
> relation. However, we have to be
> >careful. The relations used/generatedBy/derivedFrom/dependedOn/... 
> all have a notion of >causality/influence: the source of the edge 
> being influenced by the edge destination.
> >We know that causal order implies temporal order, but not the 
> converse.  I am therefore reluctant
> >to introduce a relation that arbitrarily capture  temporal order.  
> What would it give us? After all,
> >we can associate time with PEs, and given such time information, we 
> can already decide if pe1 >start precedes pe2 start, or if pe1 
> end precedes pe2 start. What would a temporal relation give us >over time?
> There are many non-causal properties that are part of provenance 
> assertions.
>
> For example, to reconstruct the history of activities of an accused 
> person X on Oct 2 before the X reached the crime scene, the police 
> make the following assertions:
> 1. X bought a car at 2:00pm US ET - buying the car is PE pe1
> 2. X bought flowers at 4:00pm US ET- buying flowers is PE pe2
> 3. X hailed a taxi and travelled to crime scene at 6:00pm US ET - 
> travelling in taxi is PE pe3

This is  nice example where wasScheduledAfter can be used!

>
> In the above scenario, the police need to have temporal ordering of 
> PEs to establish that person X was in the city on the day of the crime 
> but there is no causal relation between pe1, pe2, and pe3.

There is some underpinning ordering, since there is X at 2pm, X at 4pm, 
and X at 6pm.
This is exactly the definition of wasScheduledAfter.

>
> As you stated, temporal ordering may or may not represent causal 
> relation between PEs and since non-causal ordering of PEs occur in 
> many provenance applications we need to define a property for temporal 
> ordering of PEs and causality-based temporal ordering is a 
> specialization of that property.
>
>
> >The relation wasScheduleAfter attempts to capture some temporal 
> ordering, with underpinning
> >causal influence.  You are incorrect to state that to assert 
> wasScheduledAfter you need to know >of an agent. It's exactly the 
> contrary. By asserting wasScheduledAfter, you also assert the 
> >existence of such an agent, but don't have to specify which it is.
>
> The PROV-DM currently defines the following constraint 
> for wasScheduledAfter:
> Given two process execution expressions denoted by pe1 and pe2, the 
> expression wasScheduledAfter(pe2,pe1) holds, if and only if there are 
> two entity expressions denoted by e1 and e2, such that 
> wasControlledBy(pe1,e1,qualifier(role="end")) and 
> wasControlledBy(pe2,e2,qualifier(role="start")) and 
> wasDerivedFrom(e2,e1).
> and
> This definition assumes that the activities represented by process 
> execution expressions identified by pe1 and pe2 are controlled by some 
> agents, represented by expressions identified by e1 and e2, where the 
> first agent terminates (control qualifier qualifier(role="end")) the 
> first activity, and the second initiates (control qualifier 
> qualifier(role="start")) the second. The second agent being "derived" 
> from the first enforces temporal ordering. If we don't know which are 
> the Agents associated with pe1 and pe2 then how can we state that they 
> are entities with identifiers e1 and e2?
>
> In other words, if there are two PEs (from Taverna workflows) - 
> retrieveGeneSequence and runBLASTService and John (the research robot) 
> ended retrieveGeneSequence and Tom (the research robot - derived from 
> John) started runBLASTService - then we can assert that 
> runBLASTService wasScheduledAfter retrieveGeneSequence.
>
> But, if don't know which Agents are associated with 
> retrieveGeneSequence and runBLASTService PEs then how can we assert 
> wasScheduledAfter property between the two PEs?

You will note that the constraint you copied contains "if and only if", 
so it is defining the expression wasScheduledAfter(pe2,pe1).
It is therefore fine to assert it. The existential quantifier states the 
existence of agents, but when asserting wasScheduledAfter
you don't need to know their identity. Vice-versa, if you know them and 
all other constraints are satisfied, than you can infer
a WasScheduledAfter expression.

>
> There maybe a third robot Albert and it is not related to either Tom 
> or John by wasDerivedFrom property. But, a provenance application has 
> to know which of three robots (agents) are associated with the two PEs 
> (and then verify that there is a wasDerivedFrom property linking the 
> two robots).
>
> The constraint defined for wasScheduledAfter is a rule and for the 
> rule to "fire" its conditions have to evaluate to "true".
>
> Just knowing that there exist some Agent associated 
> with retrieveGeneSequence and runBLASTService PEs will not make the 
> constraint evaluate to "true" - the provenance application has to 
> specify which Agents (John and Tom) were associated with the two PEs.
>
> Hence, according to the current PROV-DM text, my understanding is that 
> a provenance application will need to know about the specific agents 
> associated with PEs before they can use the wasScheduledAfter 
> property. This information may or may not be available to a provenance 
> application.
>
> Therefore I am raising the need for a generic ordering property for 
> PEs that can be simply asserted by provenance applications. Similar to 
> other provenance assertions the ordering of PEs can be verified later 
> using either timestamps or causal relations constraints.

You have not answered my point. What does this give you that you can't 
infer from time information?
>
> >Final point, your reference [1] had not been agreed, it is the 
> proposal you made back then.
> Hence, I had raised this issue (Issue-50) to discuss the property. To 
> clarify, has there been discussions or agreement on the two properties 
> isInformedBy and wasScheduledAfter (I may have missed the particular 
> mails in the mailing list)?

To my knowledge, this thread is the only one discussing these issues.  
As Paul indicated a while back, the proposal
is aligned with the rest of the document.

I would like to see you putting a proper definition of the concept you 
would have in mind.  I would argue
that your original text in [1] is not a definition but a requirement to 
be satisfied. Can you define this notion of temporal order in
terms of the other "building block" of PROV (e.g. process start/end etc).

Ultimately, we could introduce Allen's relations 
(http://en.wikipedia.org/wiki/Allen's_Interval_Algebra)
but I am not sure it would be helpful in this context.

Cheers,
Luc

>
> Thanks.
>
> Best,
> Satya
>
> On Sun, Oct 2, 2011 at 9:58 AM, Luc Moreau <L.Moreau@ecs.soton.ac.uk 
> <mailto:L.Moreau@ecs.soton.ac.uk>> wrote:
>
>     Hi Satya,
>
>     First, you will note that wasInformedBy is *not* a temporal
>     relation between process executions.
>     It is *not* transitive.  It requires information to flow between
>     two PEs.  For wasInformedBy(pe1,pe2),
>     a minimum constraint is that the end of pe2 does *not* precede the
>     start of pe1.
>     The data journalism example had an illustration of such relation.
>     It has been established to be useful
>     both theoretically and practically.
>
>     Second, it would be nice for PROV to have a temporal ordering
>     relation. However, we have to be
>     careful. The relations used/generatedBy/derivedFrom/dependedOn/...
>     all have a notion of causality/influence:
>     the source of the edge being influenced by the edge destination.
>
>     We know that causal order implies temporal order, but not the
>     converse.  I am therefore reluctant
>     to introduce a relation that arbitrarily capture  temporal order. 
>     What would it give us? After all,
>     we can associate time with
>     PEs, and given such time information, we can already decide if pe1
>     start precedes pe2 start, or if pe1 end
>     precedes pe2 start. What would a temporal relation give us over time?
>
>     The relation wasScheduleAfter attempts to capture some temporal
>     ordering, with underpinning
>     causal influence.  You are incorrect to state that to assert
>     wasScheduledAfter you need to know of an agent.
>     It's exactly the contrary. By asserting wasScheduledAfter, you
>     also assert the existence of such an
>     agent, but don't have to specify which it is.
>
>     Final point, your reference [1] had not been agreed, it is the
>     proposal you made back then.
>
>     So, in conclusion:
>     1. I would argue that wasInformedBy is useful, and should be kept
>     as such, ... and definitely cannot
>        be subsumed by some temporal ordering.
>
>     2. Temporal ordering *with* some form of underpinning causal
>     influence, is also useful. I agree that
>        wasScheduledAfter is a first attempt. Maybe somebody can put
>     forward alternative definitions.
>
>     Cheers,
>     Luc
>
>
>     On 02/10/11 02:03, Satya Sahoo wrote:
>>     Hi Luc,
>>     I would like to re-raise this issue since the two properties
>>     defined in PROV-DM, "wasInformedBy" and "wasScheduledAfter" do
>>     not represent the original property for ordering process
>>     executions that was agreed to by the provenance incubator group
>>     and also during the first F2F [1].
>>
>>     I believe there are primarily two dimensions/constraints for
>>     ordering process executions:
>>     a) Two PEs are scheduled (by agent/user) to execute in particular
>>     order at specific time instants, which we can represent as
>>     *time-based ordering of PEs*. Of course, additional information
>>     about which agent/user started or stopped the PEs can be
>>     specified, but the time value primarily define the ordering of
>>     the PEs.
>>
>>     b) A PE pe1 is designed to initiate/start a second PE pe2 (due to
>>     some condition being satisfied for example a specific state was
>>     reached or some entity became available), which we can represent
>>     as a *control-based ordering of PEs*. This ordering of process
>>     cannot be effectively captured by time-based ordering, since pe1
>>     may still be executing while pe2 starts.
>>
>>     Both these cases are captured by the property "wasPrecededBy"
>>     (the corresponding property in opposite direction can be
>>     "wasSucceededBy") where the PEs were ordered according to their
>>     time of start/stop or explicit start/stop by another PE.
>>
>>     Some specific comments on the current PROV-DM document
>>     Section 5.3.6 Ordering of Process Executions
>>     =====
>>     1. An information flow ordering expression is a representation
>>     that a characterized thing was generated by an activity,
>>     represented by a process execution expresion, before it was used
>>     by another activity, also represented by a process execution
>>     expression.
>>
>>     Issue: This is a particular case of "time-based ordering", there
>>     can multiple others. For example,
>>
>>     a) We can have the provenance assertions about two PEs Pe1 and
>>     Pe2: Pe1 was stopped at time instant t1 and Pe2 started at time
>>     instant t2 and t2 > t1. Hence Pe2 wasPrecededBy Pe1
>>
>>     b) Similarly, we have provenance assertions about two PEs Pe1,
>>     Pe2 and an Entity e1: Pe1 used e1 at time t1 and PE2 used e1 at
>>     time t2 and t2 > t1, hence (start of) Pe2 wasPrecededBy (start
>>     of) Pe1.
>>
>>     My suggestion to just create a single generic property for
>>     ordering of PEs (Khalid had suggested using PEs instead of
>>     Process) and allow specific provenance application to create more
>>     specialized PE ordering properties according to their requirements.
>>
>>     2. According to the current definition of "wasScheduledAfter" we
>>     cannot assert that one PE was scheduled after another PE if we
>>     don't have information about the agent associated with the PEs.
>>     Further, the name of the property seems to refer to the intended
>>     ordering of PEs rather than actual execution of PEs - a workflow
>>     specification may have "scheduled" Pe1 to execute "after" Pe2,
>>     but during the workflow run, Pe2 may have executed before Pe1?
>>
>>     Overall, I am not sure why we need two very special cases of PE
>>     ordering property instead of using a generic "wasPrecededBy" (or
>>     "wasSucceededBy") property that can be specialized as needed by
>>     different provenance applications.
>>
>>     Thanks.
>>
>>     Best,
>>     Satya
>>
>>     [1]
>>     http://www.w3.org/2011/prov/wiki/ConsolidatedConcepts#Ordering_of_process_execution
>>
>>     On Fri, Sep 23, 2011 at 8:04 AM, Luc Moreau
>>     <l.moreau@ecs.soton.ac.uk <mailto:l.moreau@ecs.soton.ac.uk>> wrote:
>>
>>
>>         Hi Satya,
>>
>>         Issue has been closed pending review, with the latest
>>         document version.
>>         Feel free to reopen if not appropriate.
>>
>>         Luc
>>
>>
>>         On 27/07/2011 02:51, Provenance Working Group Issue Tracker
>>         wrote:
>>
>>             PROV-ISSUE-50 (Ordering of Process): Defintion for
>>             Ordering of Process [Conceptual Model]
>>
>>             http://www.w3.org/2011/prov/track/issues/50
>>
>>             Raised by: Satya Sahoo
>>             On product: Conceptual Model
>>
>>             I am not sure where did we get the currently listed
>>             definition of "Ordering of Process" - it is neither
>>             listed in the original provenance concept page [1] nor in
>>             the consolidated concepts page [2].
>>
>>             I had proposed the following definition:
>>             "Ordering of processes execution (in provenance) needs to
>>             be modeled as a property linking process entities in
>>             specific order along a particular dimension (temporal or
>>             control flow)"
>>
>>             [1]http://www.w3.org/2011/prov/wiki/ConceptOrderingOfProcesses
>>             [2]
>>             http://www.w3.org/2011/prov/wiki/ConsolidatedConcepts#Ordering_of_process_execution
>>
>>
>>
>>
>>
>>
>
Received on Monday, 3 October 2011 07:07:52 UTC