From: Antoine Zimmermann <antoine.zimmermann@emse.fr>

Date: Mon, 22 Oct 2012 15:04:19 +0200

Message-ID: <50854453.2010406@emse.fr>

To: public-prov-comments@w3.org

Date: Mon, 22 Oct 2012 15:04:19 +0200

Message-ID: <50854453.2010406@emse.fr>

To: public-prov-comments@w3.org

Dear PROV working group, I'm sorry to send my comments so late, but I hope you will take a close attention to what I'm saying here. I've read PROV data model and PROV constraints and while I find PROV-DM satisfying, I have concerns about PROV Constraints. The document tries to define a kind of semantics for the PROV-DM, but it's not actually defining a proper semantics at all. And since it does not truly define a formal semantics, it has to redefine the terms that are borrowed from logic, and subtly diverge from the standard notions found in logic. For someone who knows about logic, it makes the document larger, more complicated than it should be, more difficult to read, and more difficult to implement. For someone who does not know about logic, it's not made at all simpler, on the contrary. To give an example before getting into the details, let us examine the notion of PROV-equivalence, in Section 2.4. Equivalence is a well known and understood notion in maths and logics, which is necessarily reflexive. In PROV, equivalence isn't reflexive! For X to be PROV-equivalent to Y, it is required that both X and Y are PROV-valid, which is the PROV term used to say "consistent". If a proper semantics was defined, there would be no way the document diverge from legacy logical concepts. There would be less chances for mistakes in the definitions and in the rules laters. There would be more flexibililty in the way implementations support the semantics. I see two escapes for this: 1) define an ad hoc model theory for PROV-DM, in the line of what's sketched in PROV-SEM; 2) rely on a well known logic (such as FOL), to which all PROV statements are mapped, and add FOL axioms to model the constraints. 1) is ambitious and I understand there is not enough time to investigate it enough. But 2) is really easy. PROV statements already look a lot like FOL predicates, and PROV Constraints already rely heavily on FOL notions and notations. Definitions and inferences of 4.1 are not needed if the semantics is defined on the abstract syntax. The abstract syntax is not PROV-N and does not have "syntactic sugar" or "syntactic shortcuts". Meaning, relations in abstract syntax always contain all the arguments, optional or not, possibly with existential variables. The short forms only exist in the surface syntax. Inferences in 4.2, 4.3, 4.4 and 4.5 are all writable as FOL axioms. Constraints require equality and an infinite total order (Time with order "precedes"). This does not make the logic more complicated than FOL. Detailed comments: ----------------- Section 1.2: """ We define validity and equivalence in terms of a concept called normalization. """ This is not required and suggests that implementations have to rely on normalisation. The choice of formalisation, being entirely procedural, imposes that normalisation is explicitely mentioned in almost all definitions, which in turn asks for repeatedly marking that implementations are not forced to produce normal forms. This obfuscates the document, which could be smaller and simpler. """ Definitions, inferences, and uniqueness constraints can be applied to normalize PROV instances, and event ordering, typing, and impossibility constraints can be checked on the normal form to determine validity. Equivalence of two PROV instances can be determined by comparing their normal forms. """ This also strongly suggests normalising by "applying" constraints, as later defined, although it is said later that normalisation is not required in implementations. """ For PROV documents, validity and equivalence amount to checking the validity or pairwise equivalence of their respective documents. """ should be: "... of their respective instances." """ This specification defines how validity and equivalence are to be checked, if an application elects to support them at all. """ It was said before that implementations can choose the way they check validity or equivalence. Again, this is a very strong suggestion that implementation should be made according to the presented procedures. """ Applications [...] should treat equivalent instances or documents in the same way. """ Does it mean that an instance in a bundle can be safely replaced by another equivalent instance inside the bundle? This is important because it may be understood that a bundle identifier identifies the exact set of PROV statements given in a PROV document, or that the bundle identifier simply identifies the logical content of the bundle, or even any content that logically "implies" the content of the bundle in the said document. The definitions of equivalence and inference suggests that it is the later option. Note that this is something that may have to be understood when writing PROV documents as RDF datasets, given that RDF 1.1 is not specifying any particular relationship between a graph IRI and the content of a named graph. Section 2.4: """ Merging Merging is an operation that takes two terms and compares them to see if they are equal, or can be made equal by substituting an existential variable with another term. """ Normally, merging is an operation that takes two terms and produce a single term. I do not understand what this definition has to do with "merging". After, it is said: """ Merging two terms t,t' results in either substitution S such that S(t) = S(t'), or failure indicating that there is no substitution that can be applied to both t and t' to make them equal. """ Now, merging results in a substitution. Again, it is at odd with standard definitions of "merging". """ Applying definitions, inferences, and constraints """ This section is really not needed if all is defined in terms of FOL. This section very strongly suggests one way of implementing PROV-CONSTRAINTS, which is not good. The description, and what's presented later in the normative sections, almost imposes inference materialisation as the normal way to implement all the logical operations on PROV statements (even though the spec says later, in passing, that implementations that lead to the same results are OK). Using the notion of "failure" again strongly suggests that the spec is implemented with applications of operations indicated in this section and in Section 6, when in fact any consistency checker would be alright too. """ Checking ordering, typing, and impossibility constraints The ordering, typing, and impossibility constraints are checked rather than applied. This means that they do not generate new formulas expressible in PROV, """ This sentence again strongly suggests that constraints are "applied", in the sense defined previously, indicating that implementations should follow the operations mentioned previously, when in fact there is no reason to force any materialisation or "normalisastion" at all. This is again supported by the following sentence: "Checking such constraints follows a saturation strategy similar to that for normalization" What follows is again dictating an implementation when everything is in fact just logical formulas that have to be consistency-checked. PROV then uses the word "valid" where logic would use "consistent". In logic, a formula is valid iff it is necessarily true (it's a tautology). The use of a different term does not help non-logician (they would not be disturbed by the term "consistent") but may be a problem for logicians who are used to other terms. """ Equivalence and Isomorphism [...] This is similar to the notion of equivalence used in [RDF] """ RDF does not defined its own notion of equivalence. Equivalence in RDF is simply imposed by its formal semantics, following the standard notion of equivalence in all logics. """ [...] to be considered equivalent, the two instances must also be valid """ This is very strange, as "equivalence" is defined as a non-reflexive relationship. Again, the definition of equivalence would simply follow from a proper formalisation in FOL or model theory. Moreover, again, the paragraph strongly suggests relying on "applying" normalisation to check equivalence. See comments on Section 6.1. """ From Instances to Bundles and Documents [...] a toplevel instance consisting of the set of statements not appearing within a bundle """ This /seems/ to indicate that a statement in the top-level instance cannot be repeated in a bundle, while the PROV data model does not have this constraint. """ Analogously to blank nodes in [RDF], the scope of an existential variable in PROV is the instance level """ This is true for RDF graphs, but RDF doesn't say anything about the scope of bnodes in complex structures formed out of multiple RDF graphs, such as RDF datasets. This is therefore a constraint that must be indicated and highlighted in PROV. Section 3: This section uses vocabulary that is not yet defined normatively at this position in the document. """ If determining whether two PROV instances or documents are equivalent, an application must determine whether their normal forms are equal """ This is once more dictating a way of implementing, while relying on normal forms should be completely irrelevant, as indicated by the following sentence. Section 4.1 This section defines equivalence of syntactic constructs that are purely a PROV-N issue. There is no reason to put this in the logic of PROV structures. Section 5.1 Merging is defined in a strange way, with a procedure, which makes it difficult to grasp what notion it is supposed to convey. Section 5.2 It is never specified explicitly that "strictly-precedes" is irreflexive. Therefore, the algorithm given in Section 6.1 for checking validity, which relies on this assumption, provides strictly more information than the definition of validity. Section 6.1 """ A PROV instance is valid if its normal form exists and satisfies all of the validity constraints """ Here, it is not clear what "satisfies" means. Satisfaction is a standard term in logic, provided that a proper semantics is defined. Here, there is no proper semantics so the notion of satisfaction would have to be defined. Assuming that PROV statements are indeed mapped to FOL, and constraints expressed as FOL axioms, then the standard notion of "satisfaction" would not work here. For example, the empty document doesn't satisfy any of the constraints (an empty FOL theory only satisfies tautologies, a.k.a., valid formulas, in the usual logical sense of "valid"). There wouldn't be any problem and it would be very much shorter with a proper first order semantics. In this section, "equivalence" does not require validity, as opposed to Section 2.4. Equivalence is defined according to the notion of isomorphic normal form, which is never normatively defined (the only definition being in the non-normative Section 2.4). In any case, there is no reason to rely on isomorphims at all, if only the logic of PROV was properly defined. Unfortunately, since equivalence is given in terms of normal forms, and normal forms are given purely in procedural terms, there can be no sensible justification for 2 instances to be non-equivalent. The justification would be "because the procedure gave us different results", which isn't satisfactory at all. This is again a drawback of not providing an appropriate logical account of PROV statements. Best regards, -- Antoine Zimmermann ISCOD / LSTI - Institut Henri Fayol École Nationale Supérieure des Mines de Saint-Étienne 158 cours Fauriel 42023 Saint-Étienne Cedex 2 France Tél:+33(0)4 77 42 66 03 Fax:+33(0)4 77 42 66 66 http://zimmer.aprilfoolsreview.com/Received on Monday, 22 October 2012 13:04:48 GMT

*
This archive was generated by hypermail 2.2.0+W3C-0.50
: Monday, 22 October 2012 13:04:48 GMT
*