Re: PROV-ISSUE-459 (prov-constraints-lc-review): PROV-CONSTRAINTS review [prov-dm-constraints] from Paul Groth on 2012-07-29 (public-prov-wg@w3.org from July 2012)

From: Paul Groth <p.t.groth@vu.nl>
Date: Sun, 29 Jul 2012 22:14:11 +0200
To: James Cheney <jcheney@inf.ed.ac.uk>, Luc Moreau <l.moreau@ecs.soton.ac.uk>, Paolo Missier <Paolo.Missier@ncl.ac.uk>
Cc: Provenance Working Group <public-prov-wg@w3.org>
Message-ID: <CAJCyKRrrND9yaCeYkmUWW2S7K+Jk+k2ceFdRyC8QqsZhWdKaJA@mail.gmail.com>

Hi prov-constraints editors:

This is my review of the constraints draft for last call. Sorry for
the delay, I wanted to make sure that I could implement each type of
constraint. I'm reviewing
http://dvcs.w3.org/hg/prov/raw-file/default/model/releases/ED-prov-constraints-20120723/prov-constraints.html

First, thanks for all your hard work. The document is precise and the
approach is systematic. I have more detailed comments below. Answering
the questions posed in
http://lists.w3.org/Archives/Public/public-prov-wg/2012Jul/0346.html -

1. Is PROV-CONSTRAINTS ready to be released as a last call working
draft (modulo editorial issues and resolution to the below issues)?

Yes, but there are some major editorial things that need to be done to
help implementors. Additionally, in section 6 you mention a proof in
an appendix. This is technical content so either needs to be or not
mentioned.

2. Regarding ISSUE-346: Is the role, meaning, and intended use of
each type of inference or constraint clear?
(http://www.w3.org/2011/prov/track/issues/346)

I think each definition is now precise and clear but as I will mention
in my longer comments I think there is some additional intuition
necessary to help implementers.

3. Regarding ISSUE-451: Are there any objections to the
revision-is-alternate inference?
(http://www.w3.org/2011/prov/track/issues/451)

Nope

4. Regarding ISSUE-454: Are the rules for disjointness clear and
appropriate? (http://www.w3.org/2011/prov/track/issues/454)

Yes

5. Regarding ISSUE-458: Should influence (and therefore all
subrelations, including communication) be irreflexive, or can it be
reflexive (i.e., can wasInfluencedBy(x,x) be valid)?
(http://www.w3.org/2011/prov/track/issues/458)

I think this come downs what we think the role of the constraints are.
My impression is to encourage implementers to be both explicit and
correct in the provenance they create. In terms of the example given
in the issue, I would expect that if an activity called itself you
would want to identify that has two independent activities. Thus, I
think it's irreflexive. Actually, maybe this is suggesting the need
for a part of relation around activities.

5. Are there any objections to closing other open issues on
PROV-CONSTRAINTS? They are:

- http://www.w3.org/2011/prov/track/issues/387
- http://www.w3.org/2011/prov/track/issues/394
- http://www.w3.org/2011/prov/track/issues/452
- http://www.w3.org/2011/prov/track/issues/453

I think all these issues are addressed.

6. Are there any new issues concerning definitions, constraints, or inferences?
No

==Comments==

My approach to reviewing the constraints was to attempt to implement
the constraints and inferences using semantic web technologies. You
can find the beginning of the implementation at
https://github.com/pgroth/prov-constraints-validator-spin . I have
satisfied myself that the specification can be implemented using SPIN
RDF. However, I'm not 100 % certain, which is a bit of concern.
Additionally, to get things to work I had to make sure the inferences
were done in one pass, which may go against what is specified in the
document.

My major concern is the lack of intuition about what valid provenance
is. I would describe it as follows: valid provenance identifies
exactly partial states and those partial states are correctly ordered.
I'm trying to implement the spec but as an implementor I need to know
my broad goal when implementing these constraints.

A key thing that it took me a while to get is that I need to generate
all qualified relations before applying the constraints. This is an
important point because it's sometimes unclear what should be
considered an inference or constraints.

Concretely, in the Event Ordering Constraints, the constraints are
expressed stating that the head of the rule leads to an assertion of
precedence. But actually, the thing is that you have to assert all
these precedences relations first and then check for cycles. So I
guess, are these really constraints? At any rate, the notion of
checking for cycles needs to be brought out more.

Overall, I think an implementor could use some examples that show the
results of inference and the subsequent constraint checking and just
more intuition about what a valid and invalid provenance graphs look
like.

==Some comments per section==

Section 3
I'm worried about the MUST in the compliant list "When determining
whether two PROV instances are equivalent, an application must
determine whether their normal forms are equal, as specified in
section 6. Normalization, Validity, and Equivalence."

Does this imply that I have to implement this to be compatible with
PROV-DM? I would use SHOULD…

Section 5.1
- From an RDF perspective, do I need to worry about merging? If the
assumption is that I'm provided an RDF serialization to check then no
merging is necessary. I guess the question is merging PROV-N specific?

Section 6.1
- Why do we need to talk about a hierarchy of bundles? Isn't just the
point that you want a set of provenance descriptions independent of
bundles?

Minor Notes:

- PROV objects or prov constructs - check the consistency on this
- inconsistency with naming. Do you always want to end inference with
"-inference". See Inference 11 (derivation-generation-use) and
Inference 10 (wasEndedBy-inference)

Thanks
Paul

Received on Sunday, 29 July 2012 20:14:40 UTC