W3C home > Mailing lists > Public > public-prov-wg@w3.org > July 2012

Re: PROV-ISSUE-459 (prov-constraints-lc-review): PROV-CONSTRAINTS review [prov-dm-constraints]

From: Timothy Lebo <lebot@rpi.edu>
Date: Sun, 29 Jul 2012 15:15:11 -0400
To: Provenance Working Group <public-prov-wg@w3.org>
Message-Id: <E28B05CD-98E7-4A78-914E-B5D08ECA0BE0@rpi.edu>
prov-c crew,

This is my review of prov-c. Apologies for the delay.

My previous concerns about organization and meta-discourse are largely resolved.
As I note below, there are a few hiccups on introducing and distinguishing definition/inference/constraint.
But overall, a much nicer document to read, and organized and motivated well enough to justify digging into each def/constraint/inference.

After answering the 6 questions here, I have general comments below.

I've tagged some of my general comments as BLOCKER for release to LC.

I also created diagrams for almost all of the diagrams that I hope may be considered for inclusion, or just to inspire a more intuitive visual style. (this can obviously be done after LC release)
I've attached a pdf and they are available in the repo:



On Jul 20, 2012, at 7:02 AM, Provenance Working Group Issue Tracker wrote:

> PROV-ISSUE-459 (prov-constraints-lc-review): PROV-CONSTRAINTS review [prov-dm-constraints]
> Please answer the following review questions:
> 1.  Is PROV-CONSTRAINTS ready to be released as a last call working draft (modulo editorial issues and resolution to the below issues)?

Almost, but I have flagged a few of my comments below as BLOCKER. Nothing critical, but these are too sharp to permit to go to LC.

> 2.  Regarding ISSUE-346: Is the role, meaning, and intended use of each type of inference or constraint clear?  (http://www.w3.org/2011/prov/track/issues/346)

I find have no objections the inferences and constraints listed. I cannot speak to the compound effects of their combination, though.

> 3.  Regarding ISSUE-451: Are there any objections to the revision-is-alternate inference? (http://www.w3.org/2011/prov/track/issues/451)

I think this inference should stay.

> 4.  Regarding ISSUE-454: Are the rules for disjointness clear and appropriate? (http://www.w3.org/2011/prov/track/issues/454)

I have no objections to the disjointnesses.

> 5.  Regarding ISSUE-458: Should influence (and therefore all subrelations, including communication) be irreflexive, or can it be reflexive (i.e., can wasInfluencedBy(x,x) be valid)?  (http://www.w3.org/2011/prov/track/issues/458)

I think to be proper, an influence should be irreflexive. Otherwise, a critical distinction is being abstracted away (and that's what scruffy provenance is for).

> 5.  Are there any objections to closing other open issues on PROV-CONSTRAINTS?  They are:
> http://www.w3.org/2011/prov/track/issues/387
> http://www.w3.org/2011/prov/track/issues/394

(I agreed to close this b/c it was an editorial issue - the second was deleted)

> http://www.w3.org/2011/prov/track/issues/452

The meaning of - is still confusing to me.

> http://www.w3.org/2011/prov/track/issues/453

I have no position on many of these issues.

> 6.  Are there any new issues concerning definitions, constraints, or inferences? If so, please raise as new issues to be addressed before LC vote, ideally with a suggested change that would address the issue.

Nothing beyond what I've discussed below. If any of my BLOCKER comments are worthy of a full ISSUE, please add it or ask me to add it.

general comments (WITH BLOCKERS):


The abstract seems to focus on DM too much, starting at:

PROV-DM distinguishes core structures, forming the essence of provenance information, from extended structures catering for more specific uses of provenance. PROV-DM is organized in six components, respectively dealing with: (1) entities and activities, and the time at which they were created, used, or ended; (2) derivations of entities from entities; (3) agents bearing responsibility for entities that were generated and activities that happened; (4) a notion of bundle, a mechanism to support provenance of provenance; (5) properties to link entities that refer to the same thing; and, (6) collections forming a logical structure for its members.

I'd suggest removing these part, so we can get to "This document introduces" more quickly.

My previous confusion among inference/definition/constraint is budding:

"inferences and definitions that are allowed on provenance statements and constraints"

"These inferences and constraints" (where did definitions go?)

The second paragraph of 1.2 disambiguates the notions, but the wording in the abstract might be revisited for clarity.


"PROV-SEM provides a mathematical semantics." is this still part of the family? It's not listed above.


"Provenance is a record that describes the people, institutions, entities, and activities, involved in producing, influencing, or delivering a piece of data or a thing."

Have we converged on THE definition of provenance, according to PROV? This one mentions institutions, but DM does not say it.

Suggest that we converge on the definition and use it throughout.


typo: "Some of these ariables"


suggest to link "valid' in 1.2 to http://dvcs.w3.org/hg/prov/raw-file/default/model/releases/ED-prov-constraints-20120723/prov-constraints.html#dfn-valid


"To summarize: compliant applications use definitions, inferences, and uniqueness constraints to normalize PROV instances"

where did "uniqueness" come from? Shouldn't it just be 'constraints'?

It seems that there are two types of constraints (uniqueness and ordering). Suggest to introduce this distinction before using it.


"definitions, inferences"

suggest to maintain narrative order "inferences, definitions, constraints" when listing them (since definition is a type of inference, and constraints are applied after the inferences).

Or, choose your preferred order, but be consistent throughout.

"definitions, inferences and constraints."
"presents inferences and definitions"


typo "certain statments"


By the time we get to 1.3, the reader is surprised a second time that "constraints" isn't just "constraints", but "uniqueness constraints" and "ordering constraints", then we find out that there is a third kind of constraint. Although this might be "easing into" the details, I find it a bit unsettling to find these additions.

"also defines a class of valid PROV instances by specifying constraints that valid PROV instances must satisfy." is only partly true and thus misleading, since only ordering and impossibility constraints are used to determine validity.


Why are "impossibility constraints" not mentioned in the "to summarize" in 1.2?

2. Rationale


I find this confusing:

"To avoid over-reliance on assumptions that identifying characteristics do not change, PROV allows for things to be described in different ways, with different descriptions of their partial state."


"state during an entity's lifetime"

->  ??

"state during the entity's lifetime"


"Different entities that are aspects of the same thing"


"Different entities that fix aspects of the same thing"


I disagree with: "and preserves the characteristics that make it identifiable"

since we just said earlier that "Similarly, there is no assumption that the attributes of an entity uniquely identify it."

sugest to replace with "and preserves the characteristics provided"

2.2 Events


great discussion on global clocks!


Is there a method to the ordering of events?

I think this ordering is natural:


since usage of an entity cannot occur until after its generation, and nothing can happen after invalidation.

suggest to reorder for clarity.


great definitions on events.


Table 5's column says "Definitions, Constraints, Inferences" but I see no "Defintion X" in that column.


Suggest to move Table 5's column 3 to column 1

Section 4


Should "A definition is a rule that" be "A definition is an inference"?


The following is a bit choppy. It's hard to find the subject:

"With a few exceptions (discussed below), omitted optional parameters to [PROV-N] statements, or explicit - markers, are placeholders for existentially quantified variables; that is, they denote unknown values."

suggest to drop parens and to restructure sentence.



Although I'm comfortable with the following for the purposes of validation, I would like to see some mention of how validating would happen when two distinct URIs are co-referent (i.e., this document should "handle" an owl:sameAs).

"In contrast, distinct URIs or literal values in PROV are assumed to be distinct for the purpose of checking validity or inferences."


Should the following include PROV-O? It seems to imply that PROVO cannot omit the identifiers.

Why call out the serialization at all? The DM allows them to be optional, too. So perhaps we should keep it at the abstract level?

"Identifiers can sometimes be omitted in [PROV-N] notation."


Is "desugar" a technical term that should stay in a Recommendation?


Could R be replaced by O in "For each r in {entity, activity, agent}" , to help make the distinction between an object and an influence relation?


"There are also no expansion rules for entity, agent, communiction, attribution, influence, alternate, or specialization, because these have no optional parameters aside from the identifier and attribute, which are expanded by other rules above."


"There are no expansion rules for entity, agent, communiction, attribution, influence, alternate, or specialization, because these have no optional parameters aside from the identifier and attributes, which are expanded by the rules in Definition 2."


Is this described further somewhere:

"The only exceptions, where - must be left in place, are the activity parameter in wasDerivedFrom and the plan parameter in wasAssociatedWith."

(perhaps pull the Remark up to before the table?)


table after 
"The following table characterizes the expandable parameters of"

needs borders, a table #, and an anchor to it.


The notion of "expandable parameters" is not described well enough. It seems to be an emergent term from Definition 3, but it is not clearly drawn.
What is an expandable parameter? A parameter that can be omitted, and if so, must be given an existential identifier?


Definition 4: using "R" for entity, activity, agent is misleading from the DM perspective. it is not an influence relation, it is an (object?).




"a wasDerivedFrom(id;e2,e1,a,gen,use,attrs) that specifies an activity explicitly is not equivalent to wasDerivedFrom(id;e2,e1,-,gen,use,attrs) with a missing activity."

Are you saying that it is possible for the derivation to exist without an activity that used e1 and generated e2?
Regardless of how granular activities are described to chain use of e1 and generation of e2, I could always describe a new activity that abstract those and uses e1 and generates e2.

More justification needs to be give for this absence of activity on a derivation.


I would prefer a space after the semi colon in statements such as:


since one is usually trying to "read past it".


Thank you for flagging the irrelevant variables with underscores.


The note

"A final check is required on this inference to ensure that it does not lead to non-termination, when combined with Inference 5 (communication-generation-use-inference)."

is not situated well.


"A final check is required on this inference to ensure that it does not lead to non-termination, when combined with Inference 5 (communication-generation-use-inference)."

"This inference" - the one above or below


Figure 1
I would think that the label numbering on the activities would be reversed: a1 starts before a2 starts before a3.

suggest changing labeling for more natural order.


"From an entity, we can infer that existence of generation and invalidation events."
"From an entity, we can infer the existence of generation and invalidation events."



"From an activity statemen,t"


incomplete sentence?

"From an activity statemen,t we can infer that start and end events having times matching the start and end times of the activity."
"From an activity statement, we can infer start and end events having times matching the start and end times of the activity."

also, "having times" seems odd.


This doesn't seem narrative-y enough:

"Start of a by trigger e1 and starter activity a1 implies that e1 was generated by a1."

suggest making it more narrative-y

"The start of an activity a by trigger e1 implies that e1 was generated by starting activity a1."

same with:

"Likewise, end of a by trigger e1 and ender activity a1 implies that e1 was generated by a1."
"Likewise, end of activity a by trigger e1 and ender activity a1 implies that e1 was generated by a1."


Inference 11 Why is "id2" in :


what type is it? can we rename it to something more intuitive?

(very weak suggestion)  rename  "id2,id1" to "u, g" or something the makes it easier to know what type it is.


So, no abstraction of Activities?

"the fact that the entity denoted by e2 is generated by at most one activity"



Why is the converse not the case? (narrative needs to be added) Please explicitly state the converse, as well.

Why does this inference matter? (narrative needs to be added) 

"A derivation specifying activity, generation and use events is a special case of a derivation that leaves these unspecified. (The converse is not the case)."


Inference 16

wasAssociatedWith(_id2;a, ag2, _pl2, []). (the plan is specified)

and not 
wasAssociatedWith(_id2;a, ag2, -, [])."  



"The relation alternateOf is an equivalence relation: ..."

Why is this stated, and what are its consequences? Narrative is needed.


link explicitly to the constraint mentioned in:

"Similarly, specialization is a strict partial order: it is irreflexive and transitive. Irreflexivity is handled later as a constraint."


"If merging fails, then the constraint is unsatisfiable, so application of the constraint to I fails."

What is the consequence of the constraint failing? Since this is in a procedure prior to validation, it's not clear what state we are in with instance I.


The end of this reads oddly:

"Likewise, we assume that the identifiers of relationships in PROV uniquely identify the corresponding statements a PROV instance"


It seems odd to me that the uniqueness constraints are called "uniqueness", isn't "identity constraints" a more appropriate name?

suggest to rename uniqueness constraints to identity constraints.


typos in ID variables:

IF wasStartedBy(id1;a,_e1,_a1,_t1,_attrs1) and wasStartedBy(id2;a,_e2,_a2,_t2,_attrs2), THEN id=id'.
IF wasEndedBy(id1;a,_e1,_a1,_t1,_attrs1) and wasEndedBy(id2;a,_e2,_a2,_t2,_attrs2), THEN id=id'.


The visual style in Figure 2(a), where the orange constraint triangle extends to touch the events that it orders, is not used in the remaining portions of the figure.

suggest to carry this convention into the rest of the subfigures.


Figure 2

"are represented by vertical dotted lines (…, or intersecting usage and generation edges)"

There seems to be visual ambiguity in the visual style convention. Is the time of the usage/generation at the location the (Activity,Entity) arrow crosses the vertical line? How does that intersection point differ from the point that the line connects to the activity (for generation) or entity (for usage)? If these two points were made one in the same, would there be any loss of information? Eliminating any visual distinctions that are not encoding the underlying model will help avoid confusion and distraction while reading the figures.



A note on my note: "Miscellanous suggestions about figures (originally from Tim Lebo):"

The suggestion is to make the diagonal solid arrow be dotted, like the vertical "usage" line.

Further, I'd suggest to make the timeline a solid line, to distinguish from the event style.

What is the purpose of the dotted horizontal line on the bottom of each subfigure?

Suggest to make the start and end vertical lines BLUE to match the activity.


constraint 35

Could the second _t1 be changed to _t2 for clarity? 

	• IF used(use;a,e,_t,_attrs) and wasStartedBy(start;a,_e1,_a1,_t1,_attrs1) THEN start precedes use.
	• IF used(use;a,e,_t,_attrs) and wasEndedBy(end;a,_e1,_a1,_t1,_attrs1) THEN use precedes end.


Entities cannot be revised, since a new entity would result.
Why switching between can and may?

"As with activities, entities have lifetimes: they are generated, then can be used, revised, or other entities can be derived from them, and finally they may be invalidated."
"As with activities, entities have lifetimes: they are generated, then can be used, other entities can be derived from them, and finally they can be invalidated."



"If an entity specalizes another,"

"Similarly, if an entity specializes another"


constraint 45

The phrases seems to suggest the incorrect directionality:

"If an entity specalizes another, then its generation must follow the specialized entity's generation."

seems to say:

"If an entity (e2) specializes another (e1), then its (e2) generation must follow the specialized entity's (e1 or e2?) generation."

I know we had some last minute tweaks on specializationOf at F2F3, which resulted in:

"An entity that is a specialization ◊ of another shares all aspects of the latter, and additionally presents more specific aspects of the same thing as the latter. In particular, the lifetime of the entity being specialized contains that of any specialization."

I suggest constraint-45 be reviewed from this perspective to clarify the phrasing.

The crux of the problem is that the phrase "specialized entity" can suggest the more specialized OR more general entity.



constraint 46

The phrasing:

"Similarly, if an entity specalizes another, then its invalidation must follow the specialized entity's invalidation."

must change to be less ambiguous, since it is currently misleading.


Suggest to switch the order in t_1 and t_2 in constraint 46:

IF specializationOf(e2,e1) and wasInvalidatedBy(inv1;e1,_a1,_t1,_attrs1) and wasInvalidatedBy(inv2;e2,_a2,_t2,_attrs2) THEN inv2 precedes inv1.

to align with the fact that the invalidation of the more specialized entity must precede the invalidation of the more general entity (give a natural ordering)


This is choppy and clearly just a stretch to fit it all in, but it's not naturally stated.

"Like entities and activities, agents have lifetimes that follow a familiar pattern: an agent is generated, can participate in interactions such as starting, ending or association with an activity, attribution, or delegation, and finally the agent is invalidated."


constraint 48 #2

suggest to rename _t1 to _t0 (to reconcile with the natural ordering when placed with #1)

IF wasAttributedTo(_at;e,ag,_attrs) and wasGeneratedBy(gen;ag,_a1,_t1,_attrs1) and wasInvalidatedBy(inv;e,_a2,_t2,_attrs2) THEN gen precedes inv.

make generation of agent _t0, generation of entity _t1 , invalidation of entity _t2 and invalidation of agent _t3


constraint 52 

do we know what "different relations" means?

constraint 52 and 53:

should more be said about a_1 and b_1 being different?


The phrase "and check no impossibility results from rules" seems odd to me. Could it be clearer?


Putting constraint 54 much earlier in the document may help others read all other prov-n assertions in this document.

Suggest to put it much sooner in the document.


Why is 'plan' not a type in constraint 54:

IF wasAssociatedWith(id;a,ag,pl,attrs) THEN 'activity' ∈ typeOf(a) AND 'agent' ∈ typeOf(ag) AND 'entity' ∈ typeOf(pl).

Why is 'bundle' not a type in constraint 54:

IF mentionOf(e2,e1,b) THEN 'entity' ∈ typeOf(e2) AND 'entity' ∈ typeOf(e1) AND 'entity' ∈ typeOf(b).

Why is c not a prov:Collection in constraint 54:

IF entity(c,[prov:type='prov:EmptyCollection']) THEN 'entity' ∈ typeOf(c) AND 'prov:EmptyCollection' ∈ typeOf©.


Luc often makes an argument that agents can be viewed as activities, which the Remark here does not address:

"Note that there is no disjointness between entities and agents. This is because one might want to make statements about the provenance of an agent, by making it an entity. Therefore, users may assert both entity(a1) and agent(a1) in a valid PROV instance."

Suggest to balance out this remark to indicate that an agent can be viewed as either.

Received on Sunday, 29 July 2012 19:15:42 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 16:51:18 UTC