Re: PROV-ISSUE-459 (prov-constraints-lc-review): PROV-CONSTRAINTS review [prov-dm-constraints] from Stian Soiland-Reyes on 2012-08-06 (public-prov-wg@w3.org from August 2012)

From: Stian Soiland-Reyes <soiland-reyes@cs.manchester.ac.uk>
Date: Mon, 6 Aug 2012 16:00:12 +0100
To: Provenance Working Group <public-prov-wg@w3.org>
Message-ID: <CAPRnXtm71zdfAUjgDp7RVjgK_aG6NpmAM6KwuqZr6t2Ct5V+FQ@mail.gmail.com>
The following is my review of the PROV Constraints document from:
   http://dvcs.w3.org/hg/prov/raw-file/tip/model/releases/ED-prov-constraints-20120723/prov-constraints.html

Please accept my apologies in the delay, this document, being rather
formal, required a bit more time on my behalf to be reviewed.

As usual, my review is a bit long.. :-/ I did not have time to write a
shorter one.


Summary:

This document is heavy material. This is partially natural as of its
nature by defining all constraints and inferences, but also I get a
feeling this document has had less refinements from the editors, and
particularly from the WG. Many of the issues I raise below I would
have expected the WG to discuss. I believe that after the split of
PROV-DM to PROV-Constraint and PROV-N, we were very happy with the new
form of the other documents, but that we might have neglected
PROV-Constraint.

The document is structurally easy to follow, as it progresses from
definitions and inferences through time order constraints and finally
impossibility constraints, with a summary of how PROV instances can be
normalized and validated. At some point in the constraint, the
document changes from a specification-style language to a "we assume"
paper style language. I would try to keep it at specification level.

Some of the introductory text seems to assume the reader wants to do
normalization, however I suspect most readers will only be interested
in validation; in particular to ensure their own PROV instance is
valid. I would for instance be interested in finding out how I can
express most of these as OWL constructs.


In my review below, most of the comments are about syntactic tweaks
and some light change of wording to make the text clearer. None of
these should be considered blockers for LC.

However I consider the following as blockers: (see below for details)


1) "Applications should also use definitions, inferences and
constraints to normalize PROV instances in order to determine whether
two such instances convey the same information." -- NO!

2) "compliant applications use definitions, inferences, and uniqueness
constraints to normalize PROV instances (..)" -- DELETE

3) Inference rules with existential variables causes infinite loops -
add note about not recursing on purely existential variables

4) wasDerivedFrom activity and wasAssociatiedWith plan as
non-existential  - ; causes headaches later on where the plan is
assumed to exist. -- REWORD later uses

5) Do we have WG consensus on that all entities must be invalidated,
and all activities must terminate? Seems to talk about the future,
rather than the past.

6) Do we have WG consensus on activity start/end requiring triggers?
Can an activity terminate itself without a trigger? Start
instantaneously?

7) Remove inference 12

8) Remove wasStartedByActivity  from figure

9) wasInvalidatedBy strictly follows wasGeneratedBy - do we have WG
consensus?  OK with zero-length activity? Need light justification.

10) wasDerivedFrom(e2,e1,use,gen) requires use strictly before generation - why?

11) wasAttributedTo constraints WRONG in my view.

I will create individual issues for these in the tracker.



On Fri, Jul 20, 2012 at 12:02 PM, Provenance Working Group Issue
Tracker <sysbot+tracker@w3.org> wrote:
> PROV-ISSUE-459 (prov-constraints-lc-review): PROV-CONSTRAINTS review [prov-dm-constraints]
>
> http://www.w3.org/2011/prov/track/issues/459
>
> Raised by: James Cheney
> On product: prov-dm-constraints
>
> Hi,
>
> This issue is to capture review comments for the next draft of PROV-CONSTRAINTS, which will be released soon.
>
> Please answer the following review questions:
>
> 1.  Is PROV-CONSTRAINTS ready to be released as a last call working draft (modulo editorial issues and resolution to the below issues)?

No, see blockers above. Uncertain about WG consent on some issues.
(but I might have missed decisions on these!)


> 2.  Regarding ISSUE-346: Is the role, meaning, and intended use of each type of inference or constraint clear?  (http://www.w3.org/2011/prov/track/issues/346)

Yes.


> 3.  Regarding ISSUE-451: Are there any objections to the revision-is-alternate inference? (http://www.w3.org/2011/prov/track/issues/451)

No.


> 4.  Regarding ISSUE-454: Are the rules for disjointness clear and appropriate? (http://www.w3.org/2011/prov/track/issues/454)

Yes, this is clear from constraint 52 and 53.

> 5.  Regarding ISSUE-458: Should influence (and therefore all subrelations, including communication) be irreflexive, or can it be reflexive (i.e., can wasInfluencedBy(x,x) be valid)?  (http://www.w3.org/2011/prov/track/issues/458)

We agreed in the WG to make it wasInfluencedBy not irreflexive, which
I agree with, thus removing this constraint.


> 5.  Are there any objections to closing other open issues on PROV-CONSTRAINTS?  They are:

> http://www.w3.org/2011/prov/track/issues/387
Close.


> http://www.w3.org/2011/prov/track/issues/394

Close, alternateOf *is* symmetric

> http://www.w3.org/2011/prov/track/issues/452

No, not fully, see my comments for bits where plan still seems to be required..

> http://www.w3.org/2011/prov/track/issues/453

Close. (We allow same ID)

> 6.  Are there any new issues concerning definitions, constraints, or inferences? If so, please raise as new issues to be addressed before LC vote, ideally with a suggested change that would address the issue.

Yes, see blockers above. I'll create issues for these independently.


The following is a lengthy page by page view with details of the above
blockers, spelling/grammar correction and similar remarks. Where you
see I am discussing with myself you can conclude that the
specification might be unclear or confusing. :-)



Abstract: "statements and constraints that PROV instances must satisfy
in order to be considered valid".

A definition of 'valid' would be appropriate. When checking now I see
there is an underline and link to
http://dvcs.w3.org/hg/prov/raw-file/tip/model/releases/ED-prov-constraints-20120723/prov-constraints.html#dfn-valid
(which is way down, in section 6)- but somehow this underline is not
shown in the printout (using Chrome), and hence I don't understand
that this is a new term. Use italics and add "according to this
document" (as it could still be valid by PROV-DM alone).

Without the clarification of "according to this document" the meaning
of "MUST" here is difficult - would it not be valid PROV-DM either?


> 1.1 Conventions
> Some of these ariables
--> variables


> _, by convention, to indicate that they (intentionally) appear only once in the formula; thus, the textual variable name is mnemonic only.

Remove "(intentionally)". I assume you mean here that _variables do
not actively take part in the formula, similar to how Python and Ruby
programmers might assign a value to the variable _ if they don't care
about it:  (_,_,object) = get_triple()

Meaning of 'mnemonic' here is unclear. A mnemonic is a way to remember
something. Why do I need to remember something that appears only once?
Replace with something like "such passive variables are provided
merely as an aid to the reader."


1.2 Purpose of this document

> "PROV-DM, is a conceptual data model for provenance, which is realizable using different serializations such as PROV-N and PROV-O".

PROV-O is not a serialization, it is a representation. Replace with
"representation". (PROV-N is both)

This statement might be in other documents as well, but I've not checked.


> A valid PROV instance corresponds to a consistent history of objects and interactions to which logical reasoning can be safely applied. By default, PROV instances need not be valid.

Again the valid, but here with a small definition. A forward-link to
section 6 would be helpful.



> Applications should also use definitions, inferences and constraints to normalize PROV instances in order to determine whether two such instances convey the same information.

No, they should not! It is not a requirement for applications to
determine equivalence.


Reword to something like:

> Applications which are determining whether PROV instances convey the same information SHOULD use definitions, inferences and constraints to normalize the instances.


Similarly this:

> Applications should produce valid provenance and may reject provenance that is not valid

should be:

"Applications producing provenance SHOULD ensure it is _valid_, and
similarly applications consuming provenance MAY reject provenance that
is not _valid_."


> To summarize: compliant applications use definitions, inferences, and uniqueness constraints to normalize PROV instances, and then apply event ordering constraints to determine whether the instance has a consistent event ordering. If so, the instance is valid, and the normal form is considered equivalent to the original instance. Also, any two PROV instances that yield the same normal form are considered equivalent.

Delete this whole paragraph (except for PROv-SEM reference) -  it is
also assuming applications of PROV-Constraint only want to do
normalization.  It is saying you can't be compliant without doing all
of the above!


1.3 Structure of this document
> certain statments
statements



2.1 Entities, Activities and Agents

> An entity's attributes are established when the entity is created and describe the entity's situation and (partial) state during an entity's lifetime

Ambiguous, move (partial) earlier and add 'entirety':

"An entity's attributes are established when the entity is created and
(partially) describe the entity's situation and state during the
entirety of an entity's lifetime"



> Different entities that are aspects of the same thing are called alternate,
'alternate' -> 'alternates'


> PROV relations of specialization and alternate can be used
Use actual relation names and code style:

  'specialization' -> '<code>specializationOf</code>'
  'alternate' -> '<code>alternateOf</code>'

> express dependencies between the various entities using events

italics on 'events'.


> An activity is delimited by its start and its end events; hence, it occurs over an interval delimited by two instantaneous events.

Remove "hence, "


> In contrast, an activity (..) is typically not identifiable by the characteristics it exhibits at any point during its duration.

You can't use this argument, not only have we not defined
'identifiable by characteristics' (anymore), but you just said above
that entities are *not* identified by their attributes. I would rather
use the argument that activities are not things in the world, they are
what happens to the things in the world.

> An entity usage event is the instantaneous event that marks the first instant of an entity's consumption timespan by an activity.
> Before this instant the entity had not begun to be used by the activity.

Not true, there could be multiple usages for the same entity/activity,
but at different times (say in different roles).

Delete, or replace with "The described usage had not started before
this instant, although the activity could potentially have other
usages of the same entity starting at different times."


2.3 Summary of constraints..


This table is utterly confusing to me as it is not a summary, it does
not tell me anything, and I don't understand the columns. (Not helped
by the lack of colours and borders on a print out!)

I can see it is useful to have a kind of index of all the constraints,
but the table needs some work to be understandable. In the web version
it is kind-of OK, since the constraints have human readable names -
but is there a reason to keep the third column rather than simple
horisontal headers? Perhaps even the Type/Relation column could be
done as headers?


> To be compliant:

> When processing provenance, an application may apply the inferences and definitions in section 4. Inferences and Definitions.
> When determining whether a PROV instance is valid, an application must check that all of the constraints of section 5. Constraints are satisfied on the normal form of the instance.
> When producing provenance meant for other applications to use, the application should produce valid provenance, as specified in section 6. Normalization, Validity, and Equivalence.
> When determining whether two PROV instances are equivalent, an application must determine whether their normal forms are equal, as specified in section 6. Normalization, Validity, and Equivalence.

Replace "When" with "If".


4. Inferences and Definitions

> IF hyp1 and ... and hypk THEN there exists a1 and ... and am such that conclusion1 and ... and conclusionn.
> (..)  matching hyp1... hypk can be found in a PROV instance, we can add all of the statements concl1 ... concln to

use concl1 and concln in first line. (Consistency).


4.1 Optional identifiers

> We note that Definitions 1, 2, and 3 desugar compact PROV-N notation into a normal form

"We note that" -> "Note that"  ("we" are not observing here, but specifying!)



> then the following definition holds:
> r(id;a1,...,an) holds IF AND ONLY IF r(id;a1,...,an,[]) holds.

Too many holds! I like better the syntax without all the 'holds', as you do in:

> Definition 3
> activity(id,attrs) IF AND ONLY IF activity(id,-,-,attrs).

I am confused by the 'holds'. It implies that you can simply ignore it
otherwise.



> There are also no expansion rules for entity, agent, communiction, attribution, influence, alternate, or specialization, because these have no optional parameters aside from the identifier and attribute, which are expanded by other rules above.

Delete "also".
"attribute" -> "attributes"
"by other rules above" -> "by definition 1 and 2"

What about mentionOf and bundle ?


> Thus, before proceeding to apply other definitions or inferences, most occurrences of - must be replaced by fresh existential variables, distinct from any others occurring in the instance.

As I'll explain later, the MUST here means you will never finish
processing the inferences.



"The only exceptions, where - must be left in place, are the activity
parameter in wasDerivedFrom and the plan parameter in
wasAssociatedWith."

Add "as explained in the remark below"



> (Table)
> wasDerivedFrom(id;e2,e1,attrs) id 

This can be removed, because you have already said in Definition 3 that

wasDerivedFrom(id;e2,e1,attrs) IF AND ONLY IF
wasDerivedFrom(id;e2,e1,-,-,-,attrs).


Before definition 4, which otherwise looks like a new thing, add something like:

"The following definition generalizes the definitions of the above
table for expanding parameters."

> For each r in {entity, activity, agent}, the following definition holds:
> r(a0,...,ai-1, -, ai+1, ...,an) IF AND ONLY IF there exists a' such that r(a0,...,ai-1,a',ai+1,...,an).

I don't see a reason to make this very complicated general solution
for those 3 relations, as they are very simple with regards to
optional parameters:


activity(a1, -, -, attr)
entity(e1, attr) (no existentials!)
agent(ag1, attr)  (no existentials!)


So change it to just a definition for activity() and its optional timestamps.




> For each r in { used, wasGeneratedBy, wasInfluencedBy, wasInvalidatedBy, wasStartedBy, wasEndedBy, wasInformedBy, wasDerivedFrom, wasAttributedTo, wasAssociatedWith, actedOnBehalfOf}

Make this the same order as in the table.

>  if the ith parameter of r is an expandable parameter of r

I would add "as explained in Table x" and give the table a name. As it
stands the table is not clearly marked as normative and the definition
on its own sounds incomplete.

> A final check is required on this inference to ensure that it does not lead to non-termination, when combined with Inference 5 (communication-generation-use-inference).

I assume here "this inference" should be "the following inference" -
as 'this' is inference 5 itself!


> Remark

Make it clearer that we are talking about this particular example:

> We cannot infer wasInformedBy(a3,a1) from these statements.
--> ".. from these statements alone."


> Furthermore, the illustration also shows that a3 completes before a1.
--> ".. before a1 started."


> So it is impossible for a3 to have used an entity generated by a1.
--> "So in this case it is..."


> Inference 7 (entity-generation-invalidation-inference)

> From an entity, we can infer that existence of generation and invalidation events.

This REQUIRES entities to become invalidated (at some point). It is
consistent with entities requiring generation, but it means I get
inferred strange wasInvalidatedBy for real life entities like:

entity(math:pi)
entity(phys:universe)
entity(phys:vacuum)
entity(phys:energy)
entity(concept:existence)
entity(uk:2011census)
entity(uspolitics:resultOfPresidentialElection2012)

When are these destroyed? By what? Is it certain that everything is
destroyed? What about things that are still existing at the time of
provenance being written, with this you are requiring them all to die
- I thought PROV only talked about the past. "We are all going to die"
- but you don't know when or how - so why should the PROV imply
provenance statements about the future?


> Inference 8 (activity-start-end-inference)
> From an activity statemen,t we can infer that start and end events having times matching the start and end times of the activity.


--> "statement"
Delete "that"


> IF activity(a,t1,t2,_attrs) THEN there exist _id1, _e1, _id2, and _e2 such that wasStartedBy(_id1;a,_e1,_a1,t1,[]) and wasEndedBy(_id2;a,_e2,_a2,t2,[]).

So it is impossible for an activity to start or end without a trigger?
I am not so sure about this.. this creates phantom triggers, not too
dissimilar to our previous phantom agents, in particular for a
self-terminating process this can become a bit odd, "I'll tell my self
to stop now!"

All activities must end? Same argument as for inference 7 applies.



Combining inference 7 and 8 means inference rules loops forever,
creating triggers and activities, giving a chicken-and-egg problem
going back to the beginning of the universe, and beyond.

I only find a way out of this if an activity (phys:evolutionOfUniverse
?) creates its own trigger, which I am glad to see is allowed as
Constraint 39 luckily does not use "strictly precedes".


Here's a whiteboard drawing I made (this freaked out the office..)
when trying to figure out which loops can be formed from the
inferences (I've crossed out the 'dead ends', an arrow shows 'THEN'
links between statements and/or variables of statements):

  http://www.w3.org/2011/prov/wiki/File:PROV-Constraint-inference-loops.jpg


I think this is problematic, at least, the document should say
something like "Care should be taken when implementing inference rules
7 and 8 in combination, as by recursively following  inference rules
for statements with only existential variables an endless loop would
be formed."  (In SPARQL this could for instance be by using FILTER
(!isBlank(?c)) although you would need to exclude bnodes from the
original graph)

As far as I can tell from my lovely picture, stopping inference at
purely existential variables should be enough, as these inferences
would quickly not pass on potential 'real' variables.



I initially was not sure why we would need to require inference 7 and
8, however I now see they are useful for the later event ordering.

For instance you could know that:

wasDerivedFrom(a, b)
wasDerivedFrom(b, c)

wasGeneratedBy(a, -, t1)
wasInvalidatedBy(a, -, t2)

wasGeneratedBy(c, -, t4)

Then even if you know nothing about the generation or invalidation of
b, we can conclude that t3 > t1 and do validity checks, by inferring:
wasGeneratedBy(b, -, ?t3)

and from constraints finding that t4 >= ?t3 >= t2 > t1.


Similarly a used(c) can tell you something about the wasInvalidatedBy
- it is not earlier than the time of
used/wasStartedBy/wasEndedBy/wasAssociatedWith and invalidation of a
specialization - meaning 'it must have existed then'. This is good.


> IF wasEndedBy(_id;a,e1,a1,_t,_attrs), THEN there exist _gen and _t1 such that wasGeneratedBy(_gen;e1,a1,_t1,[]).
> IF wasDerivedFrom(_id;e2,e1,a,id2,id1,_attrs), THEN there exists _t1 and _t2 such that used(id1;a,e1,_t1,[]) and wasGeneratedBy(id2;e2,a,_t2,[]) hold.

General comment:

I think the style with identifiers like _gen is much more readable
than _id1 and id2, at least when they appear multiple times.  So I
would propose to change such examples to use _use1, gen1 etc.). I
would keep _id* for the first argument (if it is not used elsewhere),
but change id*.


I agree on removing inference 12 and the following paragraph and
remark, this is redundant from inference 11 and definition 4.


However, I think it is worth pointing out somewhere that "Derivation
is not defined to be transitive either", but not by "following similar
reasoning as for wasInformedBy." - I don't see how that reasoning
would apply here. If you have:

wasDerivedFrom(e3, e2)
wasDerivedFrom(e2, e1)

and you are attempting to say that

wasDerivedFrom(e3, e1) is not always possible by the reasoning of
wasInformedBy, then you are implying that e1 could have been generated
after e2. This is however not possible due to constraints on the
implied generation and usage.


It is however logically possible to not be wasDerivedFrom, but that is
because you can't go from:

  used(a, e1, t1)
  wasGeneratedBy(e2, a, t2)
to
  wasDerivedFrom(e2, e1)


EVEN IF t2 > t1, this is not true. wasDerivedFrom is a stronger
statement than simply saying that something used something and later
generated something else; it could be that e1 had nothing to do with
how a generated e2.

For instance, a type of bean-to-cup coffee maker grinds coffee beans
whenever you press the button to make a cup of coffee, then brews a
cup of coffee. However, there is a buffer of pre-ground coffee (to
speed up the process), so those beans will be part of the *next* cup
of coffee made - the coffee you get is not derived from the (same)
beans that the machine just used.

It might be worth to make a Remark about this. The remark after
Inference 12 is not this, that talks about not deriving wasGeneratedBy
from wasDerivedFrom+used.  (which to me seems self-evident).



> A derivation specifying activity, generation and use events is a special case of a derivation that leaves these unspecified. (The converse is not the case).

--> "not the case because a blank activity is a special case in definition 4"




> 4.4 Agents
> Attribution identifies an agent as responsible for an entity

The DM term is now  "ascribing", not "responsible":

http://www.w3.org/TR/prov-dm/#term-attribution
> Attribution is the ascribing of an entity to an agent.

Change sentence to match DM, "Attribution is the ascribing of an
entity to an agent".


> An agent can only be responsible for an entity if it was associated with an activity that generated the entity

"responsible for" -> "ascribed to".


> Inference 15 (attribution-inference)
> IF wasAttributedTo(_att;e,ag,_attrs) holds for some identifiers e and ag, THEN there exist a, _t, _gen, _assoc, _pl, such that wasGeneratedBy(_gen;e,a,_t,[]) and wasAssociatedWith(_assoc;a,ag,_pl,[]) hold.

This enforces _pl, not allowing it to be blank. (Goes against
definition 4).  Instead -  should be inferred, and no 'there exists'
required for -. (it might exist, though!)




> Inference 16 (delegation-inference)
> IF actedOnBehalfOf(_id;ag1, ag2, a, _attrs) THEN there exist _id1, _pl1, _id2, and _pl2 such that wasAssociatedWith(_id1;a, ag1, _pl1, []) and wasAssociatedWith(_id2;a, ag2, _pl2, []).#

Again, this enforces a plan to exist for the delegation. That is very,
very often not the case!  It is worth pointing out here that the two
plans are different.


The following remark tries to clarify this, but as remarks are not
normative, and the inferences are, then their "there exists _pl1"
would force plans to exist.

In particular, the remark says:
> "because the existential variables _pl1 and _pl2 can be replaced with constant identifiers or placeholders - independently"
..but it is never explained anywhere before this that they can be
replaced with -.




> Inference 17 (influence-inference)

Various existential variables are not on the _underscore style:

> IF wasGeneratedBy(id;e,a,t,attrs) THEN wasInfluencedBy(id; e, a, attrs).
t -> _t

> IF used(id;a,e,t,attrs) THEN wasInfluencedBy(id; a, e, attrs).
t -> _t

> IF wasInformedBy(id;a2,a1,attrs) THEN wasInfluencedBy(id; a2, a1, attrs).
> IF wasStartedBy(id;a2,e,a1,t,attrs) THEN wasInfluencedBy(id; a2, e, attrs).
a1 -> _a1
t -> _t

> IF wasEndedBy(id;a2,e,a1,t,attrs) THEN wasInfluencedBy(id; a2, e, attrs).
a1 -> _a1
t -> _t

> IF wasInvalidatedBy(id;e,a,t,attrs) THEN wasInfluencedBy(id; e, a, attrs).
t -> _t

> IF wasDerivedFrom(id; e2, e1, attrs) THEN wasInfluencedBy(id; e2, e1, attrs).
> IF wasAttributedTo(id;e,ag,attr) THEN wasInfluencedBy(id; e, ag, attrs).
> IF wasAssociatedWith(id;a,ag,pl,attrs) THEN wasInfluencedBy(id; a, ag, attrs).
pl -> -

>IF actedOnBehalfOf(id;ag2,ag1,a,attrs) THEN wasInfluencedBy(id; ag2, ag1, attrs).
a -> _a


A good question is if wasInfluencedBy can be none of the above (ie.
ex:customInfluence). I suppose yes. Does this need to be made clear in
a remark? The reason is that combined with Constraint 52 it might be
misinterpreted as disjoint subclasses (true) and equivalent with a the
union  (false).


> 5.1 Uniqueness Constraints
>  or a special symbol undefined

Not sure why this special symbol undefined is described here, as it is
not used anywhere in PROV-N but in the next bullet point (where it is
not used as a symbol!).

Suggest rewrite to "or it is <em>undefined</em>, indicating that the
merge cannot be performed."


> The merge of two attribute lists attrs1 and attrs2 is their union, considered as unordered lists, written attrs1 ∪ attrs2

Add something like ", allowing duplicate keys, but merging equal
key-value pairs".


> If merging succeeds with a substitution S, then S is applied to the instance I, yielding result I(S).

So you mean S(I) ? What notation is this?


> In the common case that a particular field of a relation uniquely determines the other fields, we call the uniqueness constraint a key constraint.

<em>key constraint</em>


> We assume that an entity has exactly one generation and invalidation event

We don't assume, here we specify!

--> "An entity has exactly..."


> It follows from the above constraints that the generation and invalidation times of an entity are unique, if specified.

Not alone. It requires also the merging procedure for Key constraints.
Could this also be boxed and made formal? Like a kind of definition?

Change "times" to "activites and times".


> IF wasGeneratedBy(id1;e,_a1,_t1,_attrs1) and wasGeneratedBy(id2;e,_a2,_t2,_attrs2), THEN id1=id2.
..
> IF wasInvalidatedBy(id1;e,_a1,_t1,_attrs1) and wasInvalidatedBy(id2;e,_a2,_t2,_attrs2), THEN id1=id2.

Just to avoid confusion, I would use change the identifiers of the second:

IF wasInvalidatedBy(id3;e,_a3,_t3,_attrs3) and
wasInvalidatedBy(id4;e,_a4,_t4,_attrs4), THEN id3=id4.


"We assume that an activity has exactly one start and end event"
-->
"An activity has.."



"Again, together with above key constraints these constraints imply
that the activity is a key for activity start and end statements"
Heavy sentence.    "Combined with key constraints 26 and its merging
procedure this implies that.."

> Constraint 29 (unique-wasStartedBy)
> IF wasStartedBy(id1;a,_e1,_a1,_t1,_attrs1) and wasStartedBy(id2;a,_e2,_a2,_t2,_attrs2), THEN id=id'.

id'-> id2

> Constraint 30 (unique-wasEndedBy)
> IF wasEndedBy(id1;a,_e1,_a1,_t1,_attrs1) and wasEndedBy(id2;a,_e2,_a2,_t2,_attrs2), THEN id=id'.

-->
IF wasEndedBy(id3;a,_e3,_a3,_t3,_attrs3) and
wasEndedBy(id4;a,_e4,_a4,_t4,_attrs4), THEN id3=id4.


> Constraint 31 (unique-startTime)
> IF activity(a,t,_t',_attrs) and wasStartedBy(id;a,_e1,_a1,t1,_attrs1), THEN t=t1.

-->
IF activity(a,t,_t2,_attrs) and
wasStartedBy(_id;a,_e1,_a1,t1,_attrs1), THEN t=t1.


> Constraint 32 (unique-endTime)
> IF activity(a,_t,t',_attrs) and wasEndedBy(_id;a,_e1,_a1,t1,_attrs1), THEN t' = t1.

-->
IF activity(a,_t,t2,_attrs) and wasEndedBy(_id;a,_e3,_a3,t3,_attrs1),
THEN t2 = t3.



> Constraint 33 (mention-unique)
> IF mentionOf(e, e1, b1) and mentionOf(e, e2, b2), THEN e1=e2 and b1=b2.

no need for this constraint, just add to constraint 25:

4. The identifier field e is a KEY for the mentionOf(e, e1, b1) statement.


> 5.2 Event Ordering Constraints

> This specification assumes that a preorder exists between instantaneous events

Link to definition for "preorder"


> Moreover, we sometimes consider strict forms of these orders: we say e1 strictly precedes e2 to indicate that e1 happened before e2. This is a transitive relation.

-->

Moreover, we sometimes consider strict forms of these orders: we say
e1 strictly precedes e2 to indicate that e1 happened before e2, but
not at the same time. This is a transitive, irreflexible relation.


> To check ordering constraints, we generate all precedes and strictly precedes relationships arising from the ordering constraints to form a directed graph, with edges marked precedes or strictly precedes, and check that there is no cycle containing a strictly precedes edge.

This sounds like a paper, describing one chosen way to check ordering.

Change to "One way to check ordering constraint is to generate all ..."


> 5.2.1 Activity constraints
>  An activity starts, then during its lifetime uses, generates or invalidates entities, and communicates with or starts other activities, and finally ends


It can also be associated with agents who affect the behaviour of the activity.

> Figure 2e

Shows wasStartedByActivity - now defunct relation. Remove.



> Constraint 35 (usage-within-activity)
> IF used(use;a,e,_t,_attrs) and wasStartedBy(start;a,_e1,_a1,_t1,_attrs1) THEN start precedes use.
> IF used(use;a,e,_t,_attrs) and wasEndedBy(end;a,_e1,_a1,_t1,_attrs1) THEN use precedes end.

Change second line to avoid duplicate variables:

IF used(use;a,e,_t,_attrs) and wasEndedBy(end;a,_e2,_a2,_t2,_attrs2)
THEN use precedes end.


Same for:
> Constraint 36 (generation-within-activity)
> IF wasGeneratedBy(gen;_e,a,_t,_attrs) and wasStartedBy(start;a,_e1,_a1,_t1,_attrs1) THEN start precedes gen.
> IF wasGeneratedBy(gen;_e,a,_t,_attrs) and wasEndedBy(end;a,_e1,_a1,_t1,_attrs1) THEN gen precedes end.

IF wasGeneratedBy(gen;_e,a,_t,_attrs) and
wasEndedBy(end;a,_e2,_a2,_t2,_attrs2) THEN gen precedes end.


Add new constraint for invalidation within activity, mirroring
generation. Add to figure 2.

As for generation, invalidation implies ordering of events, since the
invalidation event had to occur during the associated activity. This
is illustrated by Figure 2 (x) and expressed by Constraint x
(invalidation-within-activity).

Constraint x(invalidation-within-activity)
IF wasInvalidatedBy(inv;_e,a,_t,_attrs) and
wasStartedBy(start;a,_e1,_a1,_t1,_attrs1) THEN start precedes inv.
IF wasInvalidatedBy(inv;_e,a,_t,_attrs) and
wasEndedBy(end;a,_e2,_a2,_t2,_attrs2) THEN invprecedes end.



> Constraint 37 (wasInformedBy-ordering)
> IF wasInformedBy(_id;a2,a1,_attrs) and wasStartedBy(start;a1,_e1,_a1',_t1,_attrs1) and wasEndedBy(end;a2,_e2,_a2',_t2,_attrs2) THEN start precedes end.

This MUST follow from:

> Inference 5 (communication-generation-use-inference)
> IF wasInformedBy(_id;a2,a1,_attrs) holds THEN there exist e, _id1, _t1, _id2, and _t2, such that wasGeneratedBy(_id1;e,a1,_t1,[]) and used(_id2;a2,e,_t2,[]) hold.

and constraint 35 (usage-within-activity) and constraint 36
(generation-within-activity).  Thus this is a corollary rather than a
new constraint.

I would suggest removing it (and from figure 2), or writing it in a
different corollary style.



> 5.2.2 Entity constraints

Figure 3b) implies that the deriving activity starts *after* usage and
finishes *before* generation. extend blue box to extend beyond both
dotted vertical lines.

I don't understand 3c), something seems to be missing. The lower arrow
from e2 has no label (was derived from?), and there are no activities.



> Constraint 38 (generation-precedes-invalidation)
> IF wasGeneratedBy(gen;e,_a1,_t1,_attrs1) and wasInvalidatedBy(inv;e,_a2,_t2,_attrs2) THEN gen strictly precedes inv.

Why is this relation in particular *strictly precedes*? This needs to
be justified (beyond "we need some strictly in there so we can do our
loop testing"). So an entity can't have zero lifetime, but an activity
may?

An activity can use an entity at the same time as it was generated,
but it can't invalidate it then? This implies some kind of minimal
planck time on entities, which is probably OK for most applications of
PROV.


After some discussion with fellow geeks, I have however come go agree
that an entity can't have an empty lifespan, to avoid problems and to
ensure time moves forward. I think we need to formulate this by using
the description of an entity as characterizing and fixing aspects for
some duration. But currently the DM descriptions seem to imply the
opposite, entities can have zero lifespan, but activities cannot!

http://www.w3.org/TR/prov-dm/#term-entity
> An entity ◊ is a physical, digital, conceptual, or other kind of thing with some fixed aspects; entities may be real or imaginary.
> An activity ◊ is something that occurs over a period of time and acts upon or with entities; it may include consuming, processing, transforming, modifying, relocating, using, or generating entities.

I do however see bigger use for a zero lifespan activity, because it
can be used to describe transitions of entities. So can we add to
Constraint 38 some kind of remark about why an entity must have a
non-zero lifespan? Something like:

"Constraint 38 implies that an entity must have a non-zero lifespan by
using 'strictly precedes', that is the entity cannot be invalidated at
the same instant as it is generated. The reasoning for this is that a
meaningful entity is a thing with some aspects fixed. For these
aspects to be fixed, the entity must exist for some (possibly
infinitesimal) time. Note that this requirement does not apply to
activities."

Constraint 38 and 40 are not shown in Figure 3.


> First, we consider derivations, where the activity and usage are known.
-->
First, we consider derivations where the activity and usage are known. In

> the usage of e1 has to precede the generation of e2

--> ".. has to strictly precede ..."


This needs a similar explanation for why this needs to be strictly
preceded. Use and generation do not require this on their own.  I am
not sure what the reasoning here, perhaps it has to do with the
semantics of being 'derived'?

> http://www.w3.org/TR/prov-dm/#dfn-wasderivedfrom
> A derivation ◊ is a transformation of an entity into another, an update of an entity resulting in a new one, or the construction of a new entity based on a pre-existing entity.

This definition places a strict time order, it forces an 'older' and
'newer' entity. However I find it strange to apply this to the *usage*
event, and would rather keep only the following constraint:

> Constraint 42 (derivation-generation-generation-ordering)
> IF wasDerivedFrom(_d;e2,e1,attrs) and wasGeneratedBy(gen1;e1,_a1,_t1,_attrs1) and wasGeneratedBy(gen2;e2,_a2,_t2,_attrs2) THEN gen1 strictly precedes gen2.

Thus use1 precedes gen2  (but not strictly), constraint 41 can be
reformulated as a corollary with constraint 42+39.  I don't see a good
reasoning for that to be strictly.


A  suggested Remark for constraint 42:
"This constraint, similar to constraint 38, requires the derived
entity to be generated strictly following the generation of the
original entity. This follows from the <a>PROV-DM description of
derivation</a>,  A derivation is a transformation of an entity into
another, an update of an entity resulting in a new one, or the
construction of a new entity based on a pre-existing entity</em>, thus
the derived entity must be newer than the original entity. "


Unique variables to avoid confusion:

> Constraint 43 (wasStartedBy-ordering)
> IF wasStartedBy(start;_a,e,_a1,_t1,_attrs1) and wasGeneratedBy(gen;e,_a2,_t2,_attrs2) THEN gen precedes start.
> IF wasStartedBy(start;_a,e,_a1,_t1,_attrs1) and wasInvalidatedBy(inv;e,_a2,_t2,_attrs2) THEN start precedes inv.
-->
IF wasStartedBy(start;_a,e,_a1,_t1,_attrs1) and
wasInvalidatedBy(inv;e,_a3,_t3,_attrs3) THEN start precedes inv.



> Constraint 44 (wasEndedBy-ordering)
> IF wasEndedBy(end;_a,e,_a1,_t1,_attrs1) and wasGeneratedBy(gen;e,_a2,_t2,_attrs2) THEN gen precedes end.
> IF wasEndedBy(end;_a,e,_a1,_t1,_attrs1) and wasInvalidatedBy(inv;e,_a2,_t2,_attrs2) THEN end precedes inv.
-->
IF wasEndedBy(end;_a,e,_a1,_t1,_attrs1) and
wasInvalidatedBy(inv;e,_a3,_t3,_attrs3) THEN end precedes inv.




> 5.2.3 Agent constraints

This section assumes agents are entities. I think this should be
clarified, because not all agents are entities, and we don't have any
specific relations to describe the lifetime of agents.

So perhaps it should be clarified that in the case that an agent is an
activity, then similar constraints should apply with wasStartedBy(ag,
..) instead of wasGeneratedBy(ag, ... ) and wasEndedBy(ag, ...)
instead of wasInvalidatedBy(ag, ...), and that for non-activity
non-entity agents, appropriate constraints for their particular
lifetime would apply, but that would be out of scope for this
document.


Figure 5a) is very difficult to understand, as the extent of the two
triangles is not shown. Could this be added, such as in 2a)?


> The agent may be generated, or may only become associated with the activity, after the activity start:

This sentence can be misinterpreted as the wrong kind of may. Use RFC
'MAY' to indicate that this is optional. Same for next sentence.

> the agent is required to exist before the activity end

" is required to exist" -> "MUST exist"

> agent invalidation is required to happen after the activity start

"is required to" -> "MUST"


> Constraint 47 (wasAssociatedWith-ordering)
> IF wasAssociatedWith(_assoc;a,ag,_pl,_attrs) and wasStartedBy(start;a,_e1,_a1,_t1,_attrs1) and wasInvalidatedBy(inv;ag,_a2,_t2,_attrs2) THEN start precedes inv.
> IF wasAssociatedWith(_assoc;a,ag,_pl,_attrs) and wasGeneratedBy(gen;ag,_a1,_t1,_attrs1) and wasEndedBy(end;a,_e2,_a2,_t2,_attrs2) THEN gen precedes end.

Plan is not required (and is non-expandable), so replace _pl with - or
add note that _pl above MAY be -.

Make identifiers unique in second line:

IF wasAssociatedWith(_assoc;a,ag,_pl,_attrs) and
wasGeneratedBy(gen;ag,_a3,_t3,_attrs3) and
wasEndedBy(end;a,_e4,_a4,_t4,_attrs4) THEN gen precedes end.


> An entity that was attributed to an agent must have some overlap with the agent.

Why??


> The agent is required to exist before the entity invalidation.

I don't agree with that. First of all, why has the attribution need to
have anything to do with the invalidation of an entity? If you
contribute to an entity, all of that has to happen *before* the entity
is generated. It does not matter what happens after that.

> Likewise, the entity generation must precede the agent destruction.

This also means it is not valid to attribute a book to an author if
the book was published after the author's death. (For instance The
GIrl with the Dragon Tattoo).

By our inferences, it is only a requirement that the agent was
associated with an activity that eventually gave birth to the entity.
The agent is not required to be there till the end of the activity,
that sounds like an artificial constraint to me. Thus I would remove
constraint 48.


What you can instead say that an agent's association with that
activity must precede an entity's generation, because otherwise he
can't be associated with its generating activity. This does not
directly follow from constraint 47 and Inference 15
(attribution-inference), so we need a constraint to force the
generation to be after the *association* started, the first would then
follow. Association don't have time, unfortunately, but we can use
same reasoning as in constraint 47:

IF wasAttributedTo(_at;e,ag,_attrs) and
wasGeneratedBy(genE;e,_a1,_t1,_attrs1) and
wasGeneratedBy(genAg;ag,_a1,_t1,_attrs1) THEN genAg precedes genE


We can't say anything about the entity's invalidation; attribution
relates to association with the generation, not invalidation. The
agent's invalidation after the start of the activity a1 (which does
not affect e) is covered by constraint 47+ inference 15.


> For delegation, two agents need to have some overlap in their lifetime.

Why? This means a solicitor can't act on a dead person's behalf.

> IF actedOnBehalfOf(_del;ag2,ag1,_a,_attrs) and wasGeneratedBy(gen;ag1,_a1,_t1,_attrs1) and wasInvalidatedBy(inv;ag2,_a2,_t2,_attrs2) THEN gen precedes inv.

This however, is correct. There is no requirement for overlap, just
that ag1 was born before ag2 died. Thus my great grandfather can't act
on my behalf.  (not assigned by me, at least!)

Change text to:
"For delegation, the responsible agent must have been generated before
the invalidation of the delegated agent.".


5.4:

> IF entity(c,[prov:type='prov:EmptyCollection']) THEN 'entity' ∈ typeOf(c) AND 'prov:EmptyCollection' ∈ typeOf(c).

Change to (for consistency, not needed by checks):

IF entity(c,[prov:type='prov:EmptyCollection']) THEN 'entity' ∈
typeOf(c) AND 'prov:Collection' ∈ typeOf(c) AND 'prov:EmptyCollection'
∈ typeOf(c).

> IF wasDerivedFrom(id; e2, e1, a, g2, u1, attrs) THEN 'entity' ∈ typeOf(e2) AND 'entity' ∈ typeOf(e1) AND 'activity' ∈ typeOf(a).

what if a is - ? Also need:

IF wasDerivedFrom(id; e2, e1, -, -, -, attrs) THEN 'entity' ∈
typeOf(e2) AND 'entity' ∈ typeOf(e1)


> IF wasAssociatedWith(id;a,ag,pl,attrs) THEN 'activity' ∈ typeOf(a) AND 'agent' ∈ typeOf(ag) AND 'entity' ∈ typeOf(pl).

What if pl is -? Add:

IF wasAssociatedWith(id;a,ag,-,attrs) THEN 'activity' ∈ typeOf(a) AND
'agent' ∈ typeOf(ag)


> Note that there is no disjointness between entities and agents. This is because one might want to make statements about the provenance of an agent, by making it an entity. Therefore, users may assert both entity(a1) and agent(a1) in a valid PROV instance.

Similarly, an agent might in some cases be expressed rather as an activity.


> 6. Normalization, Validity, and Equivalence
> Because of the potential interaction among inferences, definitions and constraints, the above algorithm is recursive. Nevertheless, all of our constraints fall into a class of tuple-generating dependencies and equality-generating dependencies that satisfy a termination condition called weak acyclicity that has been studied in the context of relational databases [DBCONSTRAINTS].

I find I need to add a condition to not follow inferences on purely
existential variables to avoid a recursive loop. So I would add that
to point 4.

>  (Replacing two different names with equal names does change the meaning.)

--> "... does however change ..."


> D. References

Why is PROV-DM not a normative reference?


-- 
Stian Soiland-Reyes, myGrid team
School of Computer Science
The University of Manchester
Received on Monday, 6 August 2012 15:01:10 UTC