Re: PROV comments from Clark&Parsia

Hi James,

Thank you for your response.


On Mon, Jan 14, 2013 at 10:49 AM, Evren Sirin <evren@clarkparsia.com> wrote:

> ---------- Forwarded message ----------
> From: James Cheney <jcheney@inf.ed.ac.uk>
> Date: Tue, Jan 8, 2013 at 5:36 AM
> Subject: Re: PROV comments from Clark&Parsia
> To: Evren Sirin <evren@clarkparsia.com>
> Cc: public-prov-comments@w3.org
>
>
> Hi Evren,
>
> Thanks for your feedback.  I just wanted to respond to the comments
> about the constraints to make sure I understand them, to inform (my
> part of) the eventual formal response from the WG.
>
> On Jan 4, 2013, at 5:23 PM, Evren Sirin <evren@clarkparsia.com> wrote:
>
> > Hi all,
> >
> > We are working towards supporting PROV inferences and constraints in
> > our RDF database Stardog [1]. Below are some comments about PROV
> > specification documents that we identified while working on our
> > implementation.
> >
> > * PROV Constraints
> >
> > 1. The flow control arrows in Figure 1 seem to be backwards.
>
> I believe this is because the arrows correspond to
> "derivation/generation/use" steps, which go backwards in time in PROV.
>

The caption of the figure is "Overview of the Validation Process" so I
assumed that the process would start with the PROV instance(s) and then
continue to 'Validity' after 'Normalization Activity'. In any case, this is
a minor detail.


>
> > 2. Definition 2.1 seems to be missing the id on the right-hand side.
>
> Actually, since this definition applies to entity, agent or activity
> statements, the "id" argument is just the first ordinary parameter,
> namely a_1.  Ids cannot be optional for these statements so they are
> treated uniformly with the other parameters.  It would be equivalent
> to write out the three equivalences:
>
> entity(id) IF AND ONLY IF entity(id,[])
> activity(id,t1,t2)  IF AND ONLY IF activity(id,t1,t2,[])
> agent(id) IF AND ONLY IF agent(id,[]).
>
> spelling out how to expand attribute lists for entity, activity and
> agent, and perhaps we should just do this to avoid confusion?
>

I see. Yes, since the convention is to denote ids differently from
attributes, I think this alternative would be clearer.


>
> > 3. Since uniqueness constraints are ‘applied’ and can derive new
> > atoms, it is misleading to call them constraints. The same applies to
> > typing constraints.
>
> This is a reasonable view, but I'd prefer to stick with the existing
> terminology.  The rationale for it is as follows:
>
> - uniqueness "constraints" do not infer new PROV statements, rather
> they result in either merging two statements that contain compatible
> information, or in failure.

- typing "constraints" do not infer new PROV statements, rather they
> infer auxiliary non-PROV atomic formulas about typing of identifiers,
> which can in turn lead to detecting invalidity.
> - ordering "constraints" do not infer new PROV statements, rather they
> infer auxiliary ordering formulas that could in turn lead to
> invalidity.
>

I can see typeOf and (strictly) precedes as nonPROV predicates. But why is
a statement resulting from a uniqueness constraint not considered a PROV
statement?

I'll use an example to clarify my point. Assume variable names start with
an x or a y, and all other terms are assumed to be constants. Let's take
Constraint 24:

C24. IF wasGeneratedBy(x0; x1, x2, x3, x4) and wasGeneratedBy(y0; x1, x2,
y3, y4), THEN x0 = y0.

Now, let's suppose we have the following PROV instance I:

A1. wasGeneratedBy(gen1; e,a, t1, []).
A2. wasGeneratedBy(y; e,a, t1, [attr1=value1]).

The enforcement process is as follows:

Given a constraint:

IF hyp1 and ... and hypn THEN t1 = u1 and ... and tn = un.

1. Suppose PROV instance I contains all of the hypotheses hyp1 and ... and
hypn.

There's an issue here. Instance I does not contain the body atoms of C24;
it contains A1 and A2. You should probably say that "I contains atoms that
can be unified with hyp1, ..., hypn via some substitution P".

2. Attempt to unify all of the equated terms in the conclusion t1 = u1 and
.... and tn = un.
3. If unification fails, then the constraint is unsatisfiable, so
application of the constraint to I fails. If this failure occurs during
normalization prior to validation, then I is invalid, as explained in
Section 6.

We have that P(x0) = gen1 and P(y0) = y, so we need to unify gen1 and y,
resulting in S = {y -> gen1}.

4. If unification succeeds with a substitution S, then S is applied to the
instance I, yielding result S(I).

Here's a minor issue. It's unnecessary to apply S to the whole instance, it
suffices to apply it to the atoms involved; namely, A1 and A2. Anyways,
S(I) would look like:

A1. wasGeneratedBy(gen1; e,a, t1, []).
A2. wasGeneratedBy(y; e,a, t1, [attr1=value1]).
A3. wasGeneratedBy(gen1; e,a, t2, [attr1=value1]). // S(A2) = A3

We have inferred some attributes for gen1. I don't see why A3 is not to be
considered a PROV statement.


> - only "inferences" result in new PROV statements being added to the
> instance, and only "constraints" can result in failure.
>
> Of course, logically, distinguishing between PROV statements and
> auxiliary atoms is arbitrary, and there is no real reason to
> distinguish between inferences and constraints - we could simply call
> everything an inference.  If there is a general consensus that the
> existing terminology is confusing to implementors we will revisit
> this.
>

I wonder if it would be clearer to capture the notion of validity (and the
different kinds of constraint) simply with impossibility constraints. For
example, after capturing the semantics of precedes and strictlyPrecedes, we
could have:

strictlyPrecedes(x, x) -> FALSE.

Or for every pair <A,B> of disjoint types, we could have:

typeOf(x, A), typeOf(x, B) -> FALSE.

and so on.


>
> > 4. The definition of enforcement of uniqueness constraints states one
> > should apply the resulting substitution to the whole PROV instance.
> > However, the scope of the variables is not sets of rules.
>
> I'm not sure I understand this comment.  I believe you are indicating
> that the explanation of what it means to apply a uniqueness constraint
> is unclear, because it has nonlocal effects on the whole instance that
> go beyond the statements mentioned in the constraint.  Will explaining
> this more clearly (possibly via an example) help, or am I missing the
> point?
>

I mentioned this point briefly in the example. In order to enforce a
uniqueness constraint, it is not necessary to apply S to the whole instance..


>
> > 5. Inference 9 (wasStartedBy-inference) should be: IF
> > wasStartedBy(_id; _a,e1,a1,_t,_attrs), THEN there exist _gen and _t1
> > such that wasGeneratedBy(_gen; e1,a1,_t1,[]).
> > 6. Inference 10 (wasEndedBy-inference) should be: IF wasEndedBy(_id;
> > _a,e1,a1,_t,_attrs), THEN there exist _gen and _t1 such that
> > wasGeneratedBy(_gen; e1,a1,_t1,[]).
> > 7. Inference 15.4 should be: IF wasStartedBy(id; a2,e,_a1,_t,attrs)
> > THEN wasInfluencedBy(id; a2, e, attrs).
> > 8. Inference 15.7 should be: IF wasDerivedFrom(id; e2, e1, _a, _g, _u,
> > attrs) THEN wasInfluencedBy(id; e2, e1, attrs).
>
> Here (in 5-8), the changes you are proposing are just to add "_"s
> before some variables that only appear once, correct?  This seems
> fine.
>

Yes.


>
> > 9. Constraint 56 should be: IF hadMember(c,e) and
> > 'prov:EmptyCollection' ∈ typeOf(c) THEN INVALID.
> >
>
> I believe here you are pointing out that we wrote "hasMember" which
> should have been "hadMember", right?  If so, you're right - thanks!
>

No problem.


>
> If I understand correctly I believe none of the above changes affect
> the technical content of the specification (i.e. adopting them will
> not break existing implementations), so it should be no problem to
> address them in a future version of PROV-CONSTRAINTS.
>
> Thanks again for your comments and careful reading,
>
> --James
>
>
>
> --
> The University of Edinburgh is a charitable body, registered in
> Scotland, with registration number SC005336.
>



-- 
Best,
Héctor

Received on Monday, 14 January 2013 20:47:48 UTC