- From: Stian Soiland-Reyes <soiland-reyes@cs.manchester.ac.uk>
- Date: Mon, 6 Aug 2012 16:00:12 +0100
- To: Provenance Working Group <public-prov-wg@w3.org>
The following is my review of the PROV Constraints document from: http://dvcs.w3.org/hg/prov/raw-file/tip/model/releases/ED-prov-constraints-20120723/prov-constraints.html Please accept my apologies in the delay, this document, being rather formal, required a bit more time on my behalf to be reviewed. As usual, my review is a bit long.. :-/ I did not have time to write a shorter one. Summary: This document is heavy material. This is partially natural as of its nature by defining all constraints and inferences, but also I get a feeling this document has had less refinements from the editors, and particularly from the WG. Many of the issues I raise below I would have expected the WG to discuss. I believe that after the split of PROV-DM to PROV-Constraint and PROV-N, we were very happy with the new form of the other documents, but that we might have neglected PROV-Constraint. The document is structurally easy to follow, as it progresses from definitions and inferences through time order constraints and finally impossibility constraints, with a summary of how PROV instances can be normalized and validated. At some point in the constraint, the document changes from a specification-style language to a "we assume" paper style language. I would try to keep it at specification level. Some of the introductory text seems to assume the reader wants to do normalization, however I suspect most readers will only be interested in validation; in particular to ensure their own PROV instance is valid. I would for instance be interested in finding out how I can express most of these as OWL constructs. In my review below, most of the comments are about syntactic tweaks and some light change of wording to make the text clearer. None of these should be considered blockers for LC. However I consider the following as blockers: (see below for details) 1) "Applications should also use definitions, inferences and constraints to normalize PROV instances in order to determine whether two such instances convey the same information." -- NO! 2) "compliant applications use definitions, inferences, and uniqueness constraints to normalize PROV instances (..)" -- DELETE 3) Inference rules with existential variables causes infinite loops - add note about not recursing on purely existential variables 4) wasDerivedFrom activity and wasAssociatiedWith plan as non-existential - ; causes headaches later on where the plan is assumed to exist. -- REWORD later uses 5) Do we have WG consensus on that all entities must be invalidated, and all activities must terminate? Seems to talk about the future, rather than the past. 6) Do we have WG consensus on activity start/end requiring triggers? Can an activity terminate itself without a trigger? Start instantaneously? 7) Remove inference 12 8) Remove wasStartedByActivity from figure 9) wasInvalidatedBy strictly follows wasGeneratedBy - do we have WG consensus? OK with zero-length activity? Need light justification. 10) wasDerivedFrom(e2,e1,use,gen) requires use strictly before generation - why? 11) wasAttributedTo constraints WRONG in my view. I will create individual issues for these in the tracker. On Fri, Jul 20, 2012 at 12:02 PM, Provenance Working Group Issue Tracker <sysbot+tracker@w3.org> wrote: > PROV-ISSUE-459 (prov-constraints-lc-review): PROV-CONSTRAINTS review [prov-dm-constraints] > > http://www.w3.org/2011/prov/track/issues/459 > > Raised by: James Cheney > On product: prov-dm-constraints > > Hi, > > This issue is to capture review comments for the next draft of PROV-CONSTRAINTS, which will be released soon. > > Please answer the following review questions: > > 1. Is PROV-CONSTRAINTS ready to be released as a last call working draft (modulo editorial issues and resolution to the below issues)? No, see blockers above. Uncertain about WG consent on some issues. (but I might have missed decisions on these!) > 2. Regarding ISSUE-346: Is the role, meaning, and intended use of each type of inference or constraint clear? (http://www.w3.org/2011/prov/track/issues/346) Yes. > 3. Regarding ISSUE-451: Are there any objections to the revision-is-alternate inference? (http://www.w3.org/2011/prov/track/issues/451) No. > 4. Regarding ISSUE-454: Are the rules for disjointness clear and appropriate? (http://www.w3.org/2011/prov/track/issues/454) Yes, this is clear from constraint 52 and 53. > 5. Regarding ISSUE-458: Should influence (and therefore all subrelations, including communication) be irreflexive, or can it be reflexive (i.e., can wasInfluencedBy(x,x) be valid)? (http://www.w3.org/2011/prov/track/issues/458) We agreed in the WG to make it wasInfluencedBy not irreflexive, which I agree with, thus removing this constraint. > 5. Are there any objections to closing other open issues on PROV-CONSTRAINTS? They are: > http://www.w3.org/2011/prov/track/issues/387 Close. > http://www.w3.org/2011/prov/track/issues/394 Close, alternateOf *is* symmetric > http://www.w3.org/2011/prov/track/issues/452 No, not fully, see my comments for bits where plan still seems to be required.. > http://www.w3.org/2011/prov/track/issues/453 Close. (We allow same ID) > 6. Are there any new issues concerning definitions, constraints, or inferences? If so, please raise as new issues to be addressed before LC vote, ideally with a suggested change that would address the issue. Yes, see blockers above. I'll create issues for these independently. The following is a lengthy page by page view with details of the above blockers, spelling/grammar correction and similar remarks. Where you see I am discussing with myself you can conclude that the specification might be unclear or confusing. :-) Abstract: "statements and constraints that PROV instances must satisfy in order to be considered valid". A definition of 'valid' would be appropriate. When checking now I see there is an underline and link to http://dvcs.w3.org/hg/prov/raw-file/tip/model/releases/ED-prov-constraints-20120723/prov-constraints.html#dfn-valid (which is way down, in section 6)- but somehow this underline is not shown in the printout (using Chrome), and hence I don't understand that this is a new term. Use italics and add "according to this document" (as it could still be valid by PROV-DM alone). Without the clarification of "according to this document" the meaning of "MUST" here is difficult - would it not be valid PROV-DM either? > 1.1 Conventions > Some of these ariables --> variables > _, by convention, to indicate that they (intentionally) appear only once in the formula; thus, the textual variable name is mnemonic only. Remove "(intentionally)". I assume you mean here that _variables do not actively take part in the formula, similar to how Python and Ruby programmers might assign a value to the variable _ if they don't care about it: (_,_,object) = get_triple() Meaning of 'mnemonic' here is unclear. A mnemonic is a way to remember something. Why do I need to remember something that appears only once? Replace with something like "such passive variables are provided merely as an aid to the reader." 1.2 Purpose of this document > "PROV-DM, is a conceptual data model for provenance, which is realizable using different serializations such as PROV-N and PROV-O". PROV-O is not a serialization, it is a representation. Replace with "representation". (PROV-N is both) This statement might be in other documents as well, but I've not checked. > A valid PROV instance corresponds to a consistent history of objects and interactions to which logical reasoning can be safely applied. By default, PROV instances need not be valid. Again the valid, but here with a small definition. A forward-link to section 6 would be helpful. > Applications should also use definitions, inferences and constraints to normalize PROV instances in order to determine whether two such instances convey the same information. No, they should not! It is not a requirement for applications to determine equivalence. Reword to something like: > Applications which are determining whether PROV instances convey the same information SHOULD use definitions, inferences and constraints to normalize the instances. Similarly this: > Applications should produce valid provenance and may reject provenance that is not valid should be: "Applications producing provenance SHOULD ensure it is _valid_, and similarly applications consuming provenance MAY reject provenance that is not _valid_." > To summarize: compliant applications use definitions, inferences, and uniqueness constraints to normalize PROV instances, and then apply event ordering constraints to determine whether the instance has a consistent event ordering. If so, the instance is valid, and the normal form is considered equivalent to the original instance. Also, any two PROV instances that yield the same normal form are considered equivalent. Delete this whole paragraph (except for PROv-SEM reference) - it is also assuming applications of PROV-Constraint only want to do normalization. It is saying you can't be compliant without doing all of the above! 1.3 Structure of this document > certain statments statements 2.1 Entities, Activities and Agents > An entity's attributes are established when the entity is created and describe the entity's situation and (partial) state during an entity's lifetime Ambiguous, move (partial) earlier and add 'entirety': "An entity's attributes are established when the entity is created and (partially) describe the entity's situation and state during the entirety of an entity's lifetime" > Different entities that are aspects of the same thing are called alternate, 'alternate' -> 'alternates' > PROV relations of specialization and alternate can be used Use actual relation names and code style: 'specialization' -> '<code>specializationOf</code>' 'alternate' -> '<code>alternateOf</code>' > express dependencies between the various entities using events italics on 'events'. > An activity is delimited by its start and its end events; hence, it occurs over an interval delimited by two instantaneous events. Remove "hence, " > In contrast, an activity (..) is typically not identifiable by the characteristics it exhibits at any point during its duration. You can't use this argument, not only have we not defined 'identifiable by characteristics' (anymore), but you just said above that entities are *not* identified by their attributes. I would rather use the argument that activities are not things in the world, they are what happens to the things in the world. > An entity usage event is the instantaneous event that marks the first instant of an entity's consumption timespan by an activity. > Before this instant the entity had not begun to be used by the activity. Not true, there could be multiple usages for the same entity/activity, but at different times (say in different roles). Delete, or replace with "The described usage had not started before this instant, although the activity could potentially have other usages of the same entity starting at different times." 2.3 Summary of constraints.. This table is utterly confusing to me as it is not a summary, it does not tell me anything, and I don't understand the columns. (Not helped by the lack of colours and borders on a print out!) I can see it is useful to have a kind of index of all the constraints, but the table needs some work to be understandable. In the web version it is kind-of OK, since the constraints have human readable names - but is there a reason to keep the third column rather than simple horisontal headers? Perhaps even the Type/Relation column could be done as headers? > To be compliant: > When processing provenance, an application may apply the inferences and definitions in section 4. Inferences and Definitions. > When determining whether a PROV instance is valid, an application must check that all of the constraints of section 5. Constraints are satisfied on the normal form of the instance. > When producing provenance meant for other applications to use, the application should produce valid provenance, as specified in section 6. Normalization, Validity, and Equivalence. > When determining whether two PROV instances are equivalent, an application must determine whether their normal forms are equal, as specified in section 6. Normalization, Validity, and Equivalence. Replace "When" with "If". 4. Inferences and Definitions > IF hyp1 and ... and hypk THEN there exists a1 and ... and am such that conclusion1 and ... and conclusionn. > (..) matching hyp1... hypk can be found in a PROV instance, we can add all of the statements concl1 ... concln to use concl1 and concln in first line. (Consistency). 4.1 Optional identifiers > We note that Definitions 1, 2, and 3 desugar compact PROV-N notation into a normal form "We note that" -> "Note that" ("we" are not observing here, but specifying!) > then the following definition holds: > r(id;a1,...,an) holds IF AND ONLY IF r(id;a1,...,an,[]) holds. Too many holds! I like better the syntax without all the 'holds', as you do in: > Definition 3 > activity(id,attrs) IF AND ONLY IF activity(id,-,-,attrs). I am confused by the 'holds'. It implies that you can simply ignore it otherwise. > There are also no expansion rules for entity, agent, communiction, attribution, influence, alternate, or specialization, because these have no optional parameters aside from the identifier and attribute, which are expanded by other rules above. Delete "also". "attribute" -> "attributes" "by other rules above" -> "by definition 1 and 2" What about mentionOf and bundle ? > Thus, before proceeding to apply other definitions or inferences, most occurrences of - must be replaced by fresh existential variables, distinct from any others occurring in the instance. As I'll explain later, the MUST here means you will never finish processing the inferences. "The only exceptions, where - must be left in place, are the activity parameter in wasDerivedFrom and the plan parameter in wasAssociatedWith." Add "as explained in the remark below" > (Table) > wasDerivedFrom(id;e2,e1,attrs) id This can be removed, because you have already said in Definition 3 that wasDerivedFrom(id;e2,e1,attrs) IF AND ONLY IF wasDerivedFrom(id;e2,e1,-,-,-,attrs). Before definition 4, which otherwise looks like a new thing, add something like: "The following definition generalizes the definitions of the above table for expanding parameters." > For each r in {entity, activity, agent}, the following definition holds: > r(a0,...,ai-1, -, ai+1, ...,an) IF AND ONLY IF there exists a' such that r(a0,...,ai-1,a',ai+1,...,an). I don't see a reason to make this very complicated general solution for those 3 relations, as they are very simple with regards to optional parameters: activity(a1, -, -, attr) entity(e1, attr) (no existentials!) agent(ag1, attr) (no existentials!) So change it to just a definition for activity() and its optional timestamps. > For each r in { used, wasGeneratedBy, wasInfluencedBy, wasInvalidatedBy, wasStartedBy, wasEndedBy, wasInformedBy, wasDerivedFrom, wasAttributedTo, wasAssociatedWith, actedOnBehalfOf} Make this the same order as in the table. > if the ith parameter of r is an expandable parameter of r I would add "as explained in Table x" and give the table a name. As it stands the table is not clearly marked as normative and the definition on its own sounds incomplete. > A final check is required on this inference to ensure that it does not lead to non-termination, when combined with Inference 5 (communication-generation-use-inference). I assume here "this inference" should be "the following inference" - as 'this' is inference 5 itself! > Remark Make it clearer that we are talking about this particular example: > We cannot infer wasInformedBy(a3,a1) from these statements. --> ".. from these statements alone." > Furthermore, the illustration also shows that a3 completes before a1. --> ".. before a1 started." > So it is impossible for a3 to have used an entity generated by a1. --> "So in this case it is..." > Inference 7 (entity-generation-invalidation-inference) > From an entity, we can infer that existence of generation and invalidation events. This REQUIRES entities to become invalidated (at some point). It is consistent with entities requiring generation, but it means I get inferred strange wasInvalidatedBy for real life entities like: entity(math:pi) entity(phys:universe) entity(phys:vacuum) entity(phys:energy) entity(concept:existence) entity(uk:2011census) entity(uspolitics:resultOfPresidentialElection2012) When are these destroyed? By what? Is it certain that everything is destroyed? What about things that are still existing at the time of provenance being written, with this you are requiring them all to die - I thought PROV only talked about the past. "We are all going to die" - but you don't know when or how - so why should the PROV imply provenance statements about the future? > Inference 8 (activity-start-end-inference) > From an activity statemen,t we can infer that start and end events having times matching the start and end times of the activity. --> "statement" Delete "that" > IF activity(a,t1,t2,_attrs) THEN there exist _id1, _e1, _id2, and _e2 such that wasStartedBy(_id1;a,_e1,_a1,t1,[]) and wasEndedBy(_id2;a,_e2,_a2,t2,[]). So it is impossible for an activity to start or end without a trigger? I am not so sure about this.. this creates phantom triggers, not too dissimilar to our previous phantom agents, in particular for a self-terminating process this can become a bit odd, "I'll tell my self to stop now!" All activities must end? Same argument as for inference 7 applies. Combining inference 7 and 8 means inference rules loops forever, creating triggers and activities, giving a chicken-and-egg problem going back to the beginning of the universe, and beyond. I only find a way out of this if an activity (phys:evolutionOfUniverse ?) creates its own trigger, which I am glad to see is allowed as Constraint 39 luckily does not use "strictly precedes". Here's a whiteboard drawing I made (this freaked out the office..) when trying to figure out which loops can be formed from the inferences (I've crossed out the 'dead ends', an arrow shows 'THEN' links between statements and/or variables of statements): http://www.w3.org/2011/prov/wiki/File:PROV-Constraint-inference-loops.jpg I think this is problematic, at least, the document should say something like "Care should be taken when implementing inference rules 7 and 8 in combination, as by recursively following inference rules for statements with only existential variables an endless loop would be formed." (In SPARQL this could for instance be by using FILTER (!isBlank(?c)) although you would need to exclude bnodes from the original graph) As far as I can tell from my lovely picture, stopping inference at purely existential variables should be enough, as these inferences would quickly not pass on potential 'real' variables. I initially was not sure why we would need to require inference 7 and 8, however I now see they are useful for the later event ordering. For instance you could know that: wasDerivedFrom(a, b) wasDerivedFrom(b, c) wasGeneratedBy(a, -, t1) wasInvalidatedBy(a, -, t2) wasGeneratedBy(c, -, t4) Then even if you know nothing about the generation or invalidation of b, we can conclude that t3 > t1 and do validity checks, by inferring: wasGeneratedBy(b, -, ?t3) and from constraints finding that t4 >= ?t3 >= t2 > t1. Similarly a used(c) can tell you something about the wasInvalidatedBy - it is not earlier than the time of used/wasStartedBy/wasEndedBy/wasAssociatedWith and invalidation of a specialization - meaning 'it must have existed then'. This is good. > IF wasEndedBy(_id;a,e1,a1,_t,_attrs), THEN there exist _gen and _t1 such that wasGeneratedBy(_gen;e1,a1,_t1,[]). > IF wasDerivedFrom(_id;e2,e1,a,id2,id1,_attrs), THEN there exists _t1 and _t2 such that used(id1;a,e1,_t1,[]) and wasGeneratedBy(id2;e2,a,_t2,[]) hold. General comment: I think the style with identifiers like _gen is much more readable than _id1 and id2, at least when they appear multiple times. So I would propose to change such examples to use _use1, gen1 etc.). I would keep _id* for the first argument (if it is not used elsewhere), but change id*. I agree on removing inference 12 and the following paragraph and remark, this is redundant from inference 11 and definition 4. However, I think it is worth pointing out somewhere that "Derivation is not defined to be transitive either", but not by "following similar reasoning as for wasInformedBy." - I don't see how that reasoning would apply here. If you have: wasDerivedFrom(e3, e2) wasDerivedFrom(e2, e1) and you are attempting to say that wasDerivedFrom(e3, e1) is not always possible by the reasoning of wasInformedBy, then you are implying that e1 could have been generated after e2. This is however not possible due to constraints on the implied generation and usage. It is however logically possible to not be wasDerivedFrom, but that is because you can't go from: used(a, e1, t1) wasGeneratedBy(e2, a, t2) to wasDerivedFrom(e2, e1) EVEN IF t2 > t1, this is not true. wasDerivedFrom is a stronger statement than simply saying that something used something and later generated something else; it could be that e1 had nothing to do with how a generated e2. For instance, a type of bean-to-cup coffee maker grinds coffee beans whenever you press the button to make a cup of coffee, then brews a cup of coffee. However, there is a buffer of pre-ground coffee (to speed up the process), so those beans will be part of the *next* cup of coffee made - the coffee you get is not derived from the (same) beans that the machine just used. It might be worth to make a Remark about this. The remark after Inference 12 is not this, that talks about not deriving wasGeneratedBy from wasDerivedFrom+used. (which to me seems self-evident). > A derivation specifying activity, generation and use events is a special case of a derivation that leaves these unspecified. (The converse is not the case). --> "not the case because a blank activity is a special case in definition 4" > 4.4 Agents > Attribution identifies an agent as responsible for an entity The DM term is now "ascribing", not "responsible": http://www.w3.org/TR/prov-dm/#term-attribution > Attribution is the ascribing of an entity to an agent. Change sentence to match DM, "Attribution is the ascribing of an entity to an agent". > An agent can only be responsible for an entity if it was associated with an activity that generated the entity "responsible for" -> "ascribed to". > Inference 15 (attribution-inference) > IF wasAttributedTo(_att;e,ag,_attrs) holds for some identifiers e and ag, THEN there exist a, _t, _gen, _assoc, _pl, such that wasGeneratedBy(_gen;e,a,_t,[]) and wasAssociatedWith(_assoc;a,ag,_pl,[]) hold. This enforces _pl, not allowing it to be blank. (Goes against definition 4). Instead - should be inferred, and no 'there exists' required for -. (it might exist, though!) > Inference 16 (delegation-inference) > IF actedOnBehalfOf(_id;ag1, ag2, a, _attrs) THEN there exist _id1, _pl1, _id2, and _pl2 such that wasAssociatedWith(_id1;a, ag1, _pl1, []) and wasAssociatedWith(_id2;a, ag2, _pl2, []).# Again, this enforces a plan to exist for the delegation. That is very, very often not the case! It is worth pointing out here that the two plans are different. The following remark tries to clarify this, but as remarks are not normative, and the inferences are, then their "there exists _pl1" would force plans to exist. In particular, the remark says: > "because the existential variables _pl1 and _pl2 can be replaced with constant identifiers or placeholders - independently" ..but it is never explained anywhere before this that they can be replaced with -. > Inference 17 (influence-inference) Various existential variables are not on the _underscore style: > IF wasGeneratedBy(id;e,a,t,attrs) THEN wasInfluencedBy(id; e, a, attrs). t -> _t > IF used(id;a,e,t,attrs) THEN wasInfluencedBy(id; a, e, attrs). t -> _t > IF wasInformedBy(id;a2,a1,attrs) THEN wasInfluencedBy(id; a2, a1, attrs). > IF wasStartedBy(id;a2,e,a1,t,attrs) THEN wasInfluencedBy(id; a2, e, attrs). a1 -> _a1 t -> _t > IF wasEndedBy(id;a2,e,a1,t,attrs) THEN wasInfluencedBy(id; a2, e, attrs). a1 -> _a1 t -> _t > IF wasInvalidatedBy(id;e,a,t,attrs) THEN wasInfluencedBy(id; e, a, attrs). t -> _t > IF wasDerivedFrom(id; e2, e1, attrs) THEN wasInfluencedBy(id; e2, e1, attrs). > IF wasAttributedTo(id;e,ag,attr) THEN wasInfluencedBy(id; e, ag, attrs). > IF wasAssociatedWith(id;a,ag,pl,attrs) THEN wasInfluencedBy(id; a, ag, attrs). pl -> - >IF actedOnBehalfOf(id;ag2,ag1,a,attrs) THEN wasInfluencedBy(id; ag2, ag1, attrs). a -> _a A good question is if wasInfluencedBy can be none of the above (ie. ex:customInfluence). I suppose yes. Does this need to be made clear in a remark? The reason is that combined with Constraint 52 it might be misinterpreted as disjoint subclasses (true) and equivalent with a the union (false). > 5.1 Uniqueness Constraints > or a special symbol undefined Not sure why this special symbol undefined is described here, as it is not used anywhere in PROV-N but in the next bullet point (where it is not used as a symbol!). Suggest rewrite to "or it is <em>undefined</em>, indicating that the merge cannot be performed." > The merge of two attribute lists attrs1 and attrs2 is their union, considered as unordered lists, written attrs1 ∪ attrs2 Add something like ", allowing duplicate keys, but merging equal key-value pairs". > If merging succeeds with a substitution S, then S is applied to the instance I, yielding result I(S). So you mean S(I) ? What notation is this? > In the common case that a particular field of a relation uniquely determines the other fields, we call the uniqueness constraint a key constraint. <em>key constraint</em> > We assume that an entity has exactly one generation and invalidation event We don't assume, here we specify! --> "An entity has exactly..." > It follows from the above constraints that the generation and invalidation times of an entity are unique, if specified. Not alone. It requires also the merging procedure for Key constraints. Could this also be boxed and made formal? Like a kind of definition? Change "times" to "activites and times". > IF wasGeneratedBy(id1;e,_a1,_t1,_attrs1) and wasGeneratedBy(id2;e,_a2,_t2,_attrs2), THEN id1=id2. .. > IF wasInvalidatedBy(id1;e,_a1,_t1,_attrs1) and wasInvalidatedBy(id2;e,_a2,_t2,_attrs2), THEN id1=id2. Just to avoid confusion, I would use change the identifiers of the second: IF wasInvalidatedBy(id3;e,_a3,_t3,_attrs3) and wasInvalidatedBy(id4;e,_a4,_t4,_attrs4), THEN id3=id4. "We assume that an activity has exactly one start and end event" --> "An activity has.." "Again, together with above key constraints these constraints imply that the activity is a key for activity start and end statements" Heavy sentence. "Combined with key constraints 26 and its merging procedure this implies that.." > Constraint 29 (unique-wasStartedBy) > IF wasStartedBy(id1;a,_e1,_a1,_t1,_attrs1) and wasStartedBy(id2;a,_e2,_a2,_t2,_attrs2), THEN id=id'. id'-> id2 > Constraint 30 (unique-wasEndedBy) > IF wasEndedBy(id1;a,_e1,_a1,_t1,_attrs1) and wasEndedBy(id2;a,_e2,_a2,_t2,_attrs2), THEN id=id'. --> IF wasEndedBy(id3;a,_e3,_a3,_t3,_attrs3) and wasEndedBy(id4;a,_e4,_a4,_t4,_attrs4), THEN id3=id4. > Constraint 31 (unique-startTime) > IF activity(a,t,_t',_attrs) and wasStartedBy(id;a,_e1,_a1,t1,_attrs1), THEN t=t1. --> IF activity(a,t,_t2,_attrs) and wasStartedBy(_id;a,_e1,_a1,t1,_attrs1), THEN t=t1. > Constraint 32 (unique-endTime) > IF activity(a,_t,t',_attrs) and wasEndedBy(_id;a,_e1,_a1,t1,_attrs1), THEN t' = t1. --> IF activity(a,_t,t2,_attrs) and wasEndedBy(_id;a,_e3,_a3,t3,_attrs1), THEN t2 = t3. > Constraint 33 (mention-unique) > IF mentionOf(e, e1, b1) and mentionOf(e, e2, b2), THEN e1=e2 and b1=b2. no need for this constraint, just add to constraint 25: 4. The identifier field e is a KEY for the mentionOf(e, e1, b1) statement. > 5.2 Event Ordering Constraints > This specification assumes that a preorder exists between instantaneous events Link to definition for "preorder" > Moreover, we sometimes consider strict forms of these orders: we say e1 strictly precedes e2 to indicate that e1 happened before e2. This is a transitive relation. --> Moreover, we sometimes consider strict forms of these orders: we say e1 strictly precedes e2 to indicate that e1 happened before e2, but not at the same time. This is a transitive, irreflexible relation. > To check ordering constraints, we generate all precedes and strictly precedes relationships arising from the ordering constraints to form a directed graph, with edges marked precedes or strictly precedes, and check that there is no cycle containing a strictly precedes edge. This sounds like a paper, describing one chosen way to check ordering. Change to "One way to check ordering constraint is to generate all ..." > 5.2.1 Activity constraints > An activity starts, then during its lifetime uses, generates or invalidates entities, and communicates with or starts other activities, and finally ends It can also be associated with agents who affect the behaviour of the activity. > Figure 2e Shows wasStartedByActivity - now defunct relation. Remove. > Constraint 35 (usage-within-activity) > IF used(use;a,e,_t,_attrs) and wasStartedBy(start;a,_e1,_a1,_t1,_attrs1) THEN start precedes use. > IF used(use;a,e,_t,_attrs) and wasEndedBy(end;a,_e1,_a1,_t1,_attrs1) THEN use precedes end. Change second line to avoid duplicate variables: IF used(use;a,e,_t,_attrs) and wasEndedBy(end;a,_e2,_a2,_t2,_attrs2) THEN use precedes end. Same for: > Constraint 36 (generation-within-activity) > IF wasGeneratedBy(gen;_e,a,_t,_attrs) and wasStartedBy(start;a,_e1,_a1,_t1,_attrs1) THEN start precedes gen. > IF wasGeneratedBy(gen;_e,a,_t,_attrs) and wasEndedBy(end;a,_e1,_a1,_t1,_attrs1) THEN gen precedes end. IF wasGeneratedBy(gen;_e,a,_t,_attrs) and wasEndedBy(end;a,_e2,_a2,_t2,_attrs2) THEN gen precedes end. Add new constraint for invalidation within activity, mirroring generation. Add to figure 2. As for generation, invalidation implies ordering of events, since the invalidation event had to occur during the associated activity. This is illustrated by Figure 2 (x) and expressed by Constraint x (invalidation-within-activity). Constraint x(invalidation-within-activity) IF wasInvalidatedBy(inv;_e,a,_t,_attrs) and wasStartedBy(start;a,_e1,_a1,_t1,_attrs1) THEN start precedes inv. IF wasInvalidatedBy(inv;_e,a,_t,_attrs) and wasEndedBy(end;a,_e2,_a2,_t2,_attrs2) THEN invprecedes end. > Constraint 37 (wasInformedBy-ordering) > IF wasInformedBy(_id;a2,a1,_attrs) and wasStartedBy(start;a1,_e1,_a1',_t1,_attrs1) and wasEndedBy(end;a2,_e2,_a2',_t2,_attrs2) THEN start precedes end. This MUST follow from: > Inference 5 (communication-generation-use-inference) > IF wasInformedBy(_id;a2,a1,_attrs) holds THEN there exist e, _id1, _t1, _id2, and _t2, such that wasGeneratedBy(_id1;e,a1,_t1,[]) and used(_id2;a2,e,_t2,[]) hold. and constraint 35 (usage-within-activity) and constraint 36 (generation-within-activity). Thus this is a corollary rather than a new constraint. I would suggest removing it (and from figure 2), or writing it in a different corollary style. > 5.2.2 Entity constraints Figure 3b) implies that the deriving activity starts *after* usage and finishes *before* generation. extend blue box to extend beyond both dotted vertical lines. I don't understand 3c), something seems to be missing. The lower arrow from e2 has no label (was derived from?), and there are no activities. > Constraint 38 (generation-precedes-invalidation) > IF wasGeneratedBy(gen;e,_a1,_t1,_attrs1) and wasInvalidatedBy(inv;e,_a2,_t2,_attrs2) THEN gen strictly precedes inv. Why is this relation in particular *strictly precedes*? This needs to be justified (beyond "we need some strictly in there so we can do our loop testing"). So an entity can't have zero lifetime, but an activity may? An activity can use an entity at the same time as it was generated, but it can't invalidate it then? This implies some kind of minimal planck time on entities, which is probably OK for most applications of PROV. After some discussion with fellow geeks, I have however come go agree that an entity can't have an empty lifespan, to avoid problems and to ensure time moves forward. I think we need to formulate this by using the description of an entity as characterizing and fixing aspects for some duration. But currently the DM descriptions seem to imply the opposite, entities can have zero lifespan, but activities cannot! http://www.w3.org/TR/prov-dm/#term-entity > An entity ◊ is a physical, digital, conceptual, or other kind of thing with some fixed aspects; entities may be real or imaginary. > An activity ◊ is something that occurs over a period of time and acts upon or with entities; it may include consuming, processing, transforming, modifying, relocating, using, or generating entities. I do however see bigger use for a zero lifespan activity, because it can be used to describe transitions of entities. So can we add to Constraint 38 some kind of remark about why an entity must have a non-zero lifespan? Something like: "Constraint 38 implies that an entity must have a non-zero lifespan by using 'strictly precedes', that is the entity cannot be invalidated at the same instant as it is generated. The reasoning for this is that a meaningful entity is a thing with some aspects fixed. For these aspects to be fixed, the entity must exist for some (possibly infinitesimal) time. Note that this requirement does not apply to activities." Constraint 38 and 40 are not shown in Figure 3. > First, we consider derivations, where the activity and usage are known. --> First, we consider derivations where the activity and usage are known. In > the usage of e1 has to precede the generation of e2 --> ".. has to strictly precede ..." This needs a similar explanation for why this needs to be strictly preceded. Use and generation do not require this on their own. I am not sure what the reasoning here, perhaps it has to do with the semantics of being 'derived'? > http://www.w3.org/TR/prov-dm/#dfn-wasderivedfrom > A derivation ◊ is a transformation of an entity into another, an update of an entity resulting in a new one, or the construction of a new entity based on a pre-existing entity. This definition places a strict time order, it forces an 'older' and 'newer' entity. However I find it strange to apply this to the *usage* event, and would rather keep only the following constraint: > Constraint 42 (derivation-generation-generation-ordering) > IF wasDerivedFrom(_d;e2,e1,attrs) and wasGeneratedBy(gen1;e1,_a1,_t1,_attrs1) and wasGeneratedBy(gen2;e2,_a2,_t2,_attrs2) THEN gen1 strictly precedes gen2. Thus use1 precedes gen2 (but not strictly), constraint 41 can be reformulated as a corollary with constraint 42+39. I don't see a good reasoning for that to be strictly. A suggested Remark for constraint 42: "This constraint, similar to constraint 38, requires the derived entity to be generated strictly following the generation of the original entity. This follows from the <a>PROV-DM description of derivation</a>, A derivation is a transformation of an entity into another, an update of an entity resulting in a new one, or the construction of a new entity based on a pre-existing entity</em>, thus the derived entity must be newer than the original entity. " Unique variables to avoid confusion: > Constraint 43 (wasStartedBy-ordering) > IF wasStartedBy(start;_a,e,_a1,_t1,_attrs1) and wasGeneratedBy(gen;e,_a2,_t2,_attrs2) THEN gen precedes start. > IF wasStartedBy(start;_a,e,_a1,_t1,_attrs1) and wasInvalidatedBy(inv;e,_a2,_t2,_attrs2) THEN start precedes inv. --> IF wasStartedBy(start;_a,e,_a1,_t1,_attrs1) and wasInvalidatedBy(inv;e,_a3,_t3,_attrs3) THEN start precedes inv. > Constraint 44 (wasEndedBy-ordering) > IF wasEndedBy(end;_a,e,_a1,_t1,_attrs1) and wasGeneratedBy(gen;e,_a2,_t2,_attrs2) THEN gen precedes end. > IF wasEndedBy(end;_a,e,_a1,_t1,_attrs1) and wasInvalidatedBy(inv;e,_a2,_t2,_attrs2) THEN end precedes inv. --> IF wasEndedBy(end;_a,e,_a1,_t1,_attrs1) and wasInvalidatedBy(inv;e,_a3,_t3,_attrs3) THEN end precedes inv. > 5.2.3 Agent constraints This section assumes agents are entities. I think this should be clarified, because not all agents are entities, and we don't have any specific relations to describe the lifetime of agents. So perhaps it should be clarified that in the case that an agent is an activity, then similar constraints should apply with wasStartedBy(ag, ..) instead of wasGeneratedBy(ag, ... ) and wasEndedBy(ag, ...) instead of wasInvalidatedBy(ag, ...), and that for non-activity non-entity agents, appropriate constraints for their particular lifetime would apply, but that would be out of scope for this document. Figure 5a) is very difficult to understand, as the extent of the two triangles is not shown. Could this be added, such as in 2a)? > The agent may be generated, or may only become associated with the activity, after the activity start: This sentence can be misinterpreted as the wrong kind of may. Use RFC 'MAY' to indicate that this is optional. Same for next sentence. > the agent is required to exist before the activity end " is required to exist" -> "MUST exist" > agent invalidation is required to happen after the activity start "is required to" -> "MUST" > Constraint 47 (wasAssociatedWith-ordering) > IF wasAssociatedWith(_assoc;a,ag,_pl,_attrs) and wasStartedBy(start;a,_e1,_a1,_t1,_attrs1) and wasInvalidatedBy(inv;ag,_a2,_t2,_attrs2) THEN start precedes inv. > IF wasAssociatedWith(_assoc;a,ag,_pl,_attrs) and wasGeneratedBy(gen;ag,_a1,_t1,_attrs1) and wasEndedBy(end;a,_e2,_a2,_t2,_attrs2) THEN gen precedes end. Plan is not required (and is non-expandable), so replace _pl with - or add note that _pl above MAY be -. Make identifiers unique in second line: IF wasAssociatedWith(_assoc;a,ag,_pl,_attrs) and wasGeneratedBy(gen;ag,_a3,_t3,_attrs3) and wasEndedBy(end;a,_e4,_a4,_t4,_attrs4) THEN gen precedes end. > An entity that was attributed to an agent must have some overlap with the agent. Why?? > The agent is required to exist before the entity invalidation. I don't agree with that. First of all, why has the attribution need to have anything to do with the invalidation of an entity? If you contribute to an entity, all of that has to happen *before* the entity is generated. It does not matter what happens after that. > Likewise, the entity generation must precede the agent destruction. This also means it is not valid to attribute a book to an author if the book was published after the author's death. (For instance The GIrl with the Dragon Tattoo). By our inferences, it is only a requirement that the agent was associated with an activity that eventually gave birth to the entity. The agent is not required to be there till the end of the activity, that sounds like an artificial constraint to me. Thus I would remove constraint 48. What you can instead say that an agent's association with that activity must precede an entity's generation, because otherwise he can't be associated with its generating activity. This does not directly follow from constraint 47 and Inference 15 (attribution-inference), so we need a constraint to force the generation to be after the *association* started, the first would then follow. Association don't have time, unfortunately, but we can use same reasoning as in constraint 47: IF wasAttributedTo(_at;e,ag,_attrs) and wasGeneratedBy(genE;e,_a1,_t1,_attrs1) and wasGeneratedBy(genAg;ag,_a1,_t1,_attrs1) THEN genAg precedes genE We can't say anything about the entity's invalidation; attribution relates to association with the generation, not invalidation. The agent's invalidation after the start of the activity a1 (which does not affect e) is covered by constraint 47+ inference 15. > For delegation, two agents need to have some overlap in their lifetime. Why? This means a solicitor can't act on a dead person's behalf. > IF actedOnBehalfOf(_del;ag2,ag1,_a,_attrs) and wasGeneratedBy(gen;ag1,_a1,_t1,_attrs1) and wasInvalidatedBy(inv;ag2,_a2,_t2,_attrs2) THEN gen precedes inv. This however, is correct. There is no requirement for overlap, just that ag1 was born before ag2 died. Thus my great grandfather can't act on my behalf. (not assigned by me, at least!) Change text to: "For delegation, the responsible agent must have been generated before the invalidation of the delegated agent.". 5.4: > IF entity(c,[prov:type='prov:EmptyCollection']) THEN 'entity' ∈ typeOf(c) AND 'prov:EmptyCollection' ∈ typeOf(c). Change to (for consistency, not needed by checks): IF entity(c,[prov:type='prov:EmptyCollection']) THEN 'entity' ∈ typeOf(c) AND 'prov:Collection' ∈ typeOf(c) AND 'prov:EmptyCollection' ∈ typeOf(c). > IF wasDerivedFrom(id; e2, e1, a, g2, u1, attrs) THEN 'entity' ∈ typeOf(e2) AND 'entity' ∈ typeOf(e1) AND 'activity' ∈ typeOf(a). what if a is - ? Also need: IF wasDerivedFrom(id; e2, e1, -, -, -, attrs) THEN 'entity' ∈ typeOf(e2) AND 'entity' ∈ typeOf(e1) > IF wasAssociatedWith(id;a,ag,pl,attrs) THEN 'activity' ∈ typeOf(a) AND 'agent' ∈ typeOf(ag) AND 'entity' ∈ typeOf(pl). What if pl is -? Add: IF wasAssociatedWith(id;a,ag,-,attrs) THEN 'activity' ∈ typeOf(a) AND 'agent' ∈ typeOf(ag) > Note that there is no disjointness between entities and agents. This is because one might want to make statements about the provenance of an agent, by making it an entity. Therefore, users may assert both entity(a1) and agent(a1) in a valid PROV instance. Similarly, an agent might in some cases be expressed rather as an activity. > 6. Normalization, Validity, and Equivalence > Because of the potential interaction among inferences, definitions and constraints, the above algorithm is recursive. Nevertheless, all of our constraints fall into a class of tuple-generating dependencies and equality-generating dependencies that satisfy a termination condition called weak acyclicity that has been studied in the context of relational databases [DBCONSTRAINTS]. I find I need to add a condition to not follow inferences on purely existential variables to avoid a recursive loop. So I would add that to point 4. > (Replacing two different names with equal names does change the meaning.) --> "... does however change ..." > D. References Why is PROV-DM not a normative reference? -- Stian Soiland-Reyes, myGrid team School of Computer Science The University of Manchester
Received on Monday, 6 August 2012 15:01:10 UTC