Re: PROV-ISSUE-50 (Ordering of Process): Defintion for Ordering of Process [Conceptual Model] from Luc Moreau on 2011-11-30 (public-prov-wg@w3.org from November 2011)

From: Luc Moreau <L.Moreau@ecs.soton.ac.uk>
Date: Wed, 30 Nov 2011 08:45:46 +0000
To: public-prov-wg@w3.org
Message-ID: <EMEW3|58d2038a8cddd3003b4e3a3c3ecdf6fdnAY8jr08L.Moreau|ecs.soton.ac.uk|4ED5ED3A>
Hi Satya,
The discussion on this thread has not progressed since earlier October.

The latest WD contains a new relation wasStartedBy between activities, 
which is
simpler than wasScheduledAfter.

For the second time, I am proposing to formally close this issue.

Best regards,
Luc

On 10/03/2011 08:05 AM, Luc Moreau wrote:
> Hi Satya,
>
> Responses interleaved.
>
> On 03/10/11 01:54, Satya Sahoo wrote:
>> Hi Luc,
>> My comments are inline:
>> >First, you will note that wasInformedBy is *not* a temporal relation 
>> between process executions.
>>
>> The PROV-DM currently defines the following constraint for wasInformedBy:
>> Given two process execution expressions denoted by pe1 and pe2, the 
>> expression wasInformedBy(pe2,pe1) holds, if and only if there is an 
>> entity expression denoted by e and qualifiers q1 and q2, such that 
>> wasGeneratedBy(e,pe1,q1) and used(pe2,e,q2) hold.
>>
>> If we consider the two expressions wasGeneratedBy(e, pe1, q1) and 
>> used(pe2, e, q2) - these two expressions together enforce that pe2 
>> cannot have start time that is "before" start time of pe1. This is 
>> temporal relation/ordering between pe1 and pe2. Hence, if both these 
>> expressions have to "hold" for wasInformedBy(pe2, pe1) to "hold" I am 
>> not sure how it is not a temporal ordering?
>
> I agree that some temporal constraints have to be satisfied for 
> wasInformedBy(pe2, pe1), but it's a necessary condition,
> it's not a sufficient condition.  Information (represented as entity e 
> above) is required to flow between process executions.
>
> Also, it's not a temporal order, but it's a temporal relation!  It is 
> not transitive!
>
> For these reasons (information flow and non transitivity), I feel that 
> wasInformedBy does not fall under
> your temporal ordering classification.
>
>>
>>
>> >Second, it would be nice for PROV to have a temporal ordering 
>> relation. However, we have to be
>> >careful. The relations used/generatedBy/derivedFrom/dependedOn/... 
>> all have a notion of >causality/influence: the source of the edge 
>> being influenced by the edge destination.
>> >We know that causal order implies temporal order, but not the 
>> converse.  I am therefore reluctant
>> >to introduce a relation that arbitrarily capture  temporal order.  
>> What would it give us? After all,
>> >we can associate time with PEs, and given such time information, we 
>> can already decide if pe1 >start precedes pe2 start, or if pe1 
>> end precedes pe2 start. What would a temporal relation give us >over 
>> time?
>> There are many non-causal properties that are part of provenance 
>> assertions.
>>
>> For example, to reconstruct the history of activities of an accused 
>> person X on Oct 2 before the X reached the crime scene, the police 
>> make the following assertions:
>> 1. X bought a car at 2:00pm US ET - buying the car is PE pe1
>> 2. X bought flowers at 4:00pm US ET- buying flowers is PE pe2
>> 3. X hailed a taxi and travelled to crime scene at 6:00pm US ET - 
>> travelling in taxi is PE pe3
>
> This is  nice example where wasScheduledAfter can be used!
>
>>
>> In the above scenario, the police need to have temporal ordering of 
>> PEs to establish that person X was in the city on the day of the 
>> crime but there is no causal relation between pe1, pe2, and pe3.
>
> There is some underpinning ordering, since there is X at 2pm, X at 
> 4pm, and X at 6pm.
> This is exactly the definition of wasScheduledAfter.
>
>>
>> As you stated, temporal ordering may or may not represent causal 
>> relation between PEs and since non-causal ordering of PEs occur in 
>> many provenance applications we need to define a property for 
>> temporal ordering of PEs and causality-based temporal ordering is a 
>> specialization of that property.
>>
>>
>> >The relation wasScheduleAfter attempts to capture some temporal 
>> ordering, with underpinning
>> >causal influence.  You are incorrect to state that to assert 
>> wasScheduledAfter you need to know >of an agent. It's exactly the 
>> contrary. By asserting wasScheduledAfter, you also assert the 
>> >existence of such an agent, but don't have to specify which it is.
>>
>> The PROV-DM currently defines the following constraint 
>> for wasScheduledAfter:
>> Given two process execution expressions denoted by pe1 and pe2, the 
>> expression wasScheduledAfter(pe2,pe1) holds, if and only if there are 
>> two entity expressions denoted by e1 and e2, such that 
>> wasControlledBy(pe1,e1,qualifier(role="end")) and 
>> wasControlledBy(pe2,e2,qualifier(role="start")) and 
>> wasDerivedFrom(e2,e1).
>> and
>> This definition assumes that the activities represented by process 
>> execution expressions identified by pe1 and pe2 are controlled by 
>> some agents, represented by expressions identified by e1 and e2, 
>> where the first agent terminates (control qualifier 
>> qualifier(role="end")) the first activity, and the second initiates 
>> (control qualifier qualifier(role="start")) the second. The second 
>> agent being "derived" from the first enforces temporal ordering. If 
>> we don't know which are the Agents associated with pe1 and pe2 then 
>> how can we state that they are entities with identifiers e1 and e2?
>>
>> In other words, if there are two PEs (from Taverna workflows) - 
>> retrieveGeneSequence and runBLASTService and John (the research 
>> robot) ended retrieveGeneSequence and Tom (the research robot - 
>> derived from John) started runBLASTService - then we can assert that 
>> runBLASTService wasScheduledAfter retrieveGeneSequence.
>>
>> But, if don't know which Agents are associated with 
>> retrieveGeneSequence and runBLASTService PEs then how can we assert 
>> wasScheduledAfter property between the two PEs?
>
> You will note that the constraint you copied contains "if and only 
> if", so it is defining the expression wasScheduledAfter(pe2,pe1).
> It is therefore fine to assert it. The existential quantifier states 
> the existence of agents, but when asserting wasScheduledAfter
> you don't need to know their identity. Vice-versa, if you know them 
> and all other constraints are satisfied, than you can infer
> a WasScheduledAfter expression.
>
>>
>> There maybe a third robot Albert and it is not related to either Tom 
>> or John by wasDerivedFrom property. But, a provenance application has 
>> to know which of three robots (agents) are associated with the two 
>> PEs (and then verify that there is a wasDerivedFrom property linking 
>> the two robots).
>>
>> The constraint defined for wasScheduledAfter is a rule and for the 
>> rule to "fire" its conditions have to evaluate to "true".
>>
>> Just knowing that there exist some Agent associated 
>> with retrieveGeneSequence and runBLASTService PEs will not make the 
>> constraint evaluate to "true" - the provenance application has to 
>> specify which Agents (John and Tom) were associated with the two PEs.
>>
>> Hence, according to the current PROV-DM text, my understanding is 
>> that a provenance application will need to know about the specific 
>> agents associated with PEs before they can use the wasScheduledAfter 
>> property. This information may or may not be available to a 
>> provenance application.
>>
>> Therefore I am raising the need for a generic ordering property for 
>> PEs that can be simply asserted by provenance applications. Similar 
>> to other provenance assertions the ordering of PEs can be verified 
>> later using either timestamps or causal relations constraints.
>
> You have not answered my point. What does this give you that you can't 
> infer from time information?
>>
>> >Final point, your reference [1] had not been agreed, it is the 
>> proposal you made back then.
>> Hence, I had raised this issue (Issue-50) to discuss the property. To 
>> clarify, has there been discussions or agreement on the two 
>> properties isInformedBy and wasScheduledAfter (I may have missed the 
>> particular mails in the mailing list)?
>
> To my knowledge, this thread is the only one discussing these issues.  
> As Paul indicated a while back, the proposal
> is aligned with the rest of the document.
>
> I would like to see you putting a proper definition of the concept you 
> would have in mind.  I would argue
> that your original text in [1] is not a definition but a requirement 
> to be satisfied. Can you define this notion of temporal order in
> terms of the other "building block" of PROV (e.g. process start/end etc).
>
> Ultimately, we could introduce Allen's relations 
> (http://en.wikipedia.org/wiki/Allen's_Interval_Algebra)
> but I am not sure it would be helpful in this context.
>
> Cheers,
> Luc
>
>>
>> Thanks.
>>
>> Best,
>> Satya
>>
>> On Sun, Oct 2, 2011 at 9:58 AM, Luc Moreau <L.Moreau@ecs.soton.ac.uk 
>> <mailto:L.Moreau@ecs.soton.ac.uk>> wrote:
>>
>>     Hi Satya,
>>
>>     First, you will note that wasInformedBy is *not* a temporal
>>     relation between process executions.
>>     It is *not* transitive.  It requires information to flow between
>>     two PEs.  For wasInformedBy(pe1,pe2),
>>     a minimum constraint is that the end of pe2 does *not* precede
>>     the start of pe1.
>>     The data journalism example had an illustration of such relation.
>>     It has been established to be useful
>>     both theoretically and practically.
>>
>>     Second, it would be nice for PROV to have a temporal ordering
>>     relation. However, we have to be
>>     careful. The relations
>>     used/generatedBy/derivedFrom/dependedOn/... all have a notion of
>>     causality/influence:
>>     the source of the edge being influenced by the edge destination.
>>
>>     We know that causal order implies temporal order, but not the
>>     converse.  I am therefore reluctant
>>     to introduce a relation that arbitrarily capture  temporal
>>     order.  What would it give us? After all,
>>     we can associate time with
>>     PEs, and given such time information, we can already decide if
>>     pe1 start precedes pe2 start, or if pe1 end
>>     precedes pe2 start. What would a temporal relation give us over time?
>>
>>     The relation wasScheduleAfter attempts to capture some temporal
>>     ordering, with underpinning
>>     causal influence.  You are incorrect to state that to assert
>>     wasScheduledAfter you need to know of an agent.
>>     It's exactly the contrary. By asserting wasScheduledAfter, you
>>     also assert the existence of such an
>>     agent, but don't have to specify which it is.
>>
>>     Final point, your reference [1] had not been agreed, it is the
>>     proposal you made back then.
>>
>>     So, in conclusion:
>>     1. I would argue that wasInformedBy is useful, and should be kept
>>     as such, ... and definitely cannot
>>        be subsumed by some temporal ordering.
>>
>>     2. Temporal ordering *with* some form of underpinning causal
>>     influence, is also useful. I agree that
>>        wasScheduledAfter is a first attempt. Maybe somebody can put
>>     forward alternative definitions.
>>
>>     Cheers,
>>     Luc
>>
>>
>>     On 02/10/11 02:03, Satya Sahoo wrote:
>>>     Hi Luc,
>>>     I would like to re-raise this issue since the two properties
>>>     defined in PROV-DM, "wasInformedBy" and "wasScheduledAfter" do
>>>     not represent the original property for ordering process
>>>     executions that was agreed to by the provenance incubator group
>>>     and also during the first F2F [1].
>>>
>>>     I believe there are primarily two dimensions/constraints for
>>>     ordering process executions:
>>>     a) Two PEs are scheduled (by agent/user) to execute in
>>>     particular order at specific time instants, which we can
>>>     represent as *time-based ordering of PEs*. Of course, additional
>>>     information about which agent/user started or stopped the PEs
>>>     can be specified, but the time value primarily define the
>>>     ordering of the PEs.
>>>
>>>     b) A PE pe1 is designed to initiate/start a second PE pe2 (due
>>>     to some condition being satisfied for example a specific state
>>>     was reached or some entity became available), which we can
>>>     represent as a *control-based ordering of PEs*. This ordering of
>>>     process cannot be effectively captured by time-based ordering,
>>>     since pe1 may still be executing while pe2 starts.
>>>
>>>     Both these cases are captured by the property "wasPrecededBy"
>>>     (the corresponding property in opposite direction can be
>>>     "wasSucceededBy") where the PEs were ordered according to their
>>>     time of start/stop or explicit start/stop by another PE.
>>>
>>>     Some specific comments on the current PROV-DM document
>>>     Section 5.3.6 Ordering of Process Executions
>>>     =====
>>>     1. An information flow ordering expression is a representation
>>>     that a characterized thing was generated by an activity,
>>>     represented by a process execution expresion, before it was used
>>>     by another activity, also represented by a process execution
>>>     expression.
>>>
>>>     Issue: This is a particular case of "time-based ordering", there
>>>     can multiple others. For example,
>>>
>>>     a) We can have the provenance assertions about two PEs Pe1 and
>>>     Pe2: Pe1 was stopped at time instant t1 and Pe2 started at time
>>>     instant t2 and t2 > t1. Hence Pe2 wasPrecededBy Pe1
>>>
>>>     b) Similarly, we have provenance assertions about two PEs Pe1,
>>>     Pe2 and an Entity e1: Pe1 used e1 at time t1 and PE2 used e1 at
>>>     time t2 and t2 > t1, hence (start of) Pe2 wasPrecededBy (start
>>>     of) Pe1.
>>>
>>>     My suggestion to just create a single generic property for
>>>     ordering of PEs (Khalid had suggested using PEs instead of
>>>     Process) and allow specific provenance application to create
>>>     more specialized PE ordering properties according to their
>>>     requirements.
>>>
>>>     2. According to the current definition of "wasScheduledAfter" we
>>>     cannot assert that one PE was scheduled after another PE if we
>>>     don't have information about the agent associated with the PEs.
>>>     Further, the name of the property seems to refer to the intended
>>>     ordering of PEs rather than actual execution of PEs - a workflow
>>>     specification may have "scheduled" Pe1 to execute "after" Pe2,
>>>     but during the workflow run, Pe2 may have executed before Pe1?
>>>
>>>     Overall, I am not sure why we need two very special cases of PE
>>>     ordering property instead of using a generic "wasPrecededBy" (or
>>>     "wasSucceededBy") property that can be specialized as needed by
>>>     different provenance applications.
>>>
>>>     Thanks.
>>>
>>>     Best,
>>>     Satya
>>>
>>>     [1]
>>>     http://www.w3.org/2011/prov/wiki/ConsolidatedConcepts#Ordering_of_process_execution
>>>
>>>     On Fri, Sep 23, 2011 at 8:04 AM, Luc Moreau
>>>     <l.moreau@ecs.soton.ac.uk <mailto:l.moreau@ecs.soton.ac.uk>> wrote:
>>>
>>>
>>>         Hi Satya,
>>>
>>>         Issue has been closed pending review, with the latest
>>>         document version.
>>>         Feel free to reopen if not appropriate.
>>>
>>>         Luc
>>>
>>>
>>>         On 27/07/2011 02:51, Provenance Working Group Issue Tracker
>>>         wrote:
>>>
>>>             PROV-ISSUE-50 (Ordering of Process): Defintion for
>>>             Ordering of Process [Conceptual Model]
>>>
>>>             http://www.w3.org/2011/prov/track/issues/50
>>>
>>>             Raised by: Satya Sahoo
>>>             On product: Conceptual Model
>>>
>>>             I am not sure where did we get the currently listed
>>>             definition of "Ordering of Process" - it is neither
>>>             listed in the original provenance concept page [1] nor
>>>             in the consolidated concepts page [2].
>>>
>>>             I had proposed the following definition:
>>>             "Ordering of processes execution (in provenance) needs
>>>             to be modeled as a property linking process entities in
>>>             specific order along a particular dimension (temporal or
>>>             control flow)"
>>>
>>>             [1]http://www.w3.org/2011/prov/wiki/ConceptOrderingOfProcesses
>>>             [2]
>>>             http://www.w3.org/2011/prov/wiki/ConsolidatedConcepts#Ordering_of_process_execution
>>>
>>>
>>>
>>>
>>>
>>>
>>

-- 
Professor Luc Moreau
Electronics and Computer Science   tel:   +44 23 8059 4487
University of Southampton          fax:   +44 23 8059 2865
Southampton SO17 1BJ               email: l.moreau@ecs.soton.ac.uk
United Kingdom                     http://www.ecs.soton.ac.uk/~lavm
Received on Wednesday, 30 November 2011 08:46:42 UTC